Applying facial masks to faces in live video
10062216 ยท 2018-08-28
Inventors
Cpc classification
G06T19/20
PHYSICS
G06V40/169
PHYSICS
G06V40/171
PHYSICS
International classification
G06T19/00
PHYSICS
G06T7/246
PHYSICS
Abstract
A method for applying facial masks to faces in live video. The method includes receiving an image containing a face from a user, wherein the image is a frame of a video and identifying the coordinates of a face in the image. The method also includes identifying the coordinates of facial elements within the face previously identified and synchronizing a bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements. The method further includes applying the bitmap add-ons over the frame of the identified face.
Claims
1. A computer system, the computer system comprising: one or more hardware processors; system memory coupled to the one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more hardware processors executing the instructions stored in the system memory to apply facial masks to faces in live video, including the following: receive an image containing a face from a user, wherein the image is a frame of a video; identify the coordinates of a face in the image; identify the coordinates of facial elements within the face previously identified using a successive steps method, wherein the successive steps method includes: placing an approximate grid into the region containing the face; running an integer chooser at each coordinate and concatenating the outputs; constructing a vector E; calculating the (d.sub.x,d.sub.y) shifts for the k-th facial element; and shifting the X and Y coordinates of f.sub.k by the calculated (d.sub.x,d.sub.y) shifts; synchronize a bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements; and apply the bitmap add-ons over the frame of the identified face.
2. The system of claim 1, the instructions stored in the system memory further comprising: applying 3D effects.
3. The system of claim 2, wherein applying 3D effects includes: rotating a 3D model to match the rotation of the face.
4. The system of claim 2, wherein applying 3D effects includes: animating a 3D model.
5. The system of claim 1, wherein the bitmap add-on includes a character's face.
6. The system of claim 1, the instructions stored in the system memory further comprising: repeating the successive steps methods.
7. The system of claim 6, wherein repeating the successive steps method includes: repeating until the value of the (d.sub.x,d.sub.y) shifts falls below a predetermined value.
8. The system of claim 6, wherein repeating the successive steps method includes creating a succession number and assigning the initial value to zero; assigning a total succession number; incrementing the succession number each time the successive steps method is run; and repeating the successive steps method until the succession number is equal to or greater than the total succession number.
9. The system of claim 1, wherein the approximate grid is a facial grid containing the average coordinates of facial elements as detected in a predetermined number of faces.
10. The system of claim 1, wherein the approximate grid takes into account the position, size and the approximate rotation angle of the predetermined number of faces.
11. The system of claim 1, wherein calculating the (d.sub.x,d.sub.y) shifts for the k-th facial element includes: taking a dot product of E with D.sub.S,k=(D.sub.S,k.sup.x,D.sub.S,k.sup.y), where D.sub.S,k=(D.sub.S,k,.sup.x,D.sub.S,k.sup.y) is the trained data for the k-th facial element.
12. A computer system, the computer system comprising: one or more hardware processors; system memory coupled to the one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more hardware processors executing the instructions stored in the system memory to apply facial masks to faces in live video, including the following: receive an image containing a face from a user, wherein the image is a frame of a video; identify the coordinates of a face in the image; identify the coordinates of facial elements within the face previously identified using a successive steps method, wherein the successive steps method includes: placing an approximate grid into the region containing the face; running an integer chooser at each coordinate and concatenating the outputs; constructing a vector E; calculating the (d.sub.x,d.sub.y) shifts for the k-th facial element; and shifting the X and Y coordinates of f.sub.k by the calculated (d.sub.x,d.sub.y) shifts; synchronize a bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements, the method for synchronizing the bitmap add-on including: smoothing facial element coordinates in the current frame based on previous frames; warping the face in the image; and warping the bitmap add-on; and apply the bitmap add-ons over the frame of the identified face.
13. The system of claim 12, wherein smoothing facial element coordinates in the current frame based on previous frames includes applying a smoothing filter.
14. The system of claim 13, wherein the smoothing filter includes: a temporal Gaussian filter.
15. The system of claim 13, wherein the smoothing filter includes: a bilateral Gaussian filter.
16. A computer system, the computer system comprising: one or more hardware processors; system memory coupled to the one or more hardware processors, the system memory storing instructions that are executable by the one or more hardware processors; the one or more hardware processors executing the instructions stored in the system memory to apply facial masks to faces in live video, including the following: receive an image containing a face from a user, wherein the image is a frame of a video; identify the coordinates of a face in the image; identify the coordinates of facial elements within the face previously identified using a successive steps method, wherein the successive steps method includes: placing an approximate grid into the region containing the face; running an integer chooser at each coordinate and concatenating the outputs; constructing a vector E; calculating the (d.sub.x,d.sub.y) shifts for the k-th facial element; and shifting the X and Y coordinates of f.sub.k by the calculated (d.sub.x,d.sub.y) shifts; train a detector, wherein training a detector allows for synchronization of a bitmap add-on; synchronize the bitmap add-on, wherein synchronizing the bitmap add-on includes aligning the bitmap add-on with the identified facial elements, the method for synchronizing the bitmap add-on including: smoothing facial element coordinates in the current frame based on previous frames; warping the face in the image; and warping the bitmap add-on; and apply the bitmap add-ons over the frame of the identified face.
17. The system of claim 16, wherein training the detector includes: obtaining a database of photos and their corresponding grids with facial feature markup; calculating the mean grid by averaging all obtained grids; constructing a set that contains starting grids for each photo; and repeating the steps: calculating optimal displacements; calculating the choose junctions; calculating E.sub.jq for each (I.sub.j,G.sub.jq); finding the solution of the minimization problem that allows to displace k-th facial element of G.sub.jq to make it closer to M.sub.jq; and shifting each (k-th) facial element of G.sub.jq by (d.sub.x,d.sub.y).
18. The system of claim 17, wherein repeating the steps includes: performing the steps a predetermined number of times.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) To further clarify various aspects of some example embodiments of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is appreciated that these drawings depict only illustrated embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION OF SOME EXAMPLE EMBODIMENTS
(10) Reference will now be made to the figures wherein like structures will be provided with like reference designations. It is understood that the figures are diagrammatic and schematic representations of some embodiments of the invention, and are not limiting of the present invention, nor are they necessarily drawn to scale.
(11)
(12) For example, we can have a face of a character and put it over a user's face, so that its facial elements (e.g., eyes, mouth, nose, eyebrows, etc.) are aligned with the users' facial elements. Additionally or alternatively, we can place features like wrinkles or facial paint over the user's face. In such case the character's face or wrinkles which are placed over the user's face is called a bitmap add-on. One of skill in the art will appreciate that the video may include more than one face and that the method 100 is applicable to any number of faces within the video without restriction. For example, a first character's face can be placed over the face of a first user and a second character's face can be placed over the face of a second user.
(13)
(14)
(15)
(16) Alternatively, the coordinates of facial elements can be identified 106 using a successive steps method to identify the coordinates of facial elements. One example of this method is disclosed below.
(17)
(18) One of skill in the art will appreciate that synchronizing 108 the bitmap add-on can be done in a number of ways. For example, there is third-party software which is capable of producing the desired result. Therefore, one or more software package can be used with the results compared to determine which solution creates the best effect. Additionally or alternatively, a method of synchronizing 108 the bitmap add-on which may be used is disclosed below.
(19)
(20)
(21) One skilled in the art will appreciate that, for this and other processes and methods disclosed herein, the functions performed in the processes and methods may be implemented in differing order. Furthermore, the outlined steps and operations are only provided as examples, and some of the steps and operations may be optional, combined into fewer steps and operations, or expanded into additional steps and operations without detracting from the essence of the disclosed embodiments.
(22)
(23)
(24)
(25)
(26)
(27)
(28) .sup.M at each coordinate f.sub.k and concatenating the outputs: B=(b.sub.0(f.sub.0), . . . , b.sub.N1(f.sub.N1))
.sup.N*M. The integer chooser b.sub.k(x,Y)
.sup.M yields a vector of M integer numbers at each (x,y) coordinate of an image. It takes into account the pixels of the image. For example, the integer chooser can be based on the multi-level choosing procedure. Other kinds of assigning a vector of integer numbers to an (x,y) coordinate could be also used.
(29) Within the present invention, for each b.sub.k there are M such multi-step choosing procedures, each yielding a number H. Thus, within the present invention b.sub.k=(H.sub.k,0, . . . , H.sub.k,M1). Each multi-step choosing procedure includes choose junctions J.sub.kij, i=[0,M1], j=[0, N.sub.ki1], one of them labeled as the initial. A choose junction contains a set of parameters. It also may contain the main link and the auxiliary link to other choose junctions. Let us define C.sub.(I,x,y)=I(x+x.sub.1,y+y.sub.1)+I(x+x.sub.2,y+y.sub.2), where =(x.sub.1,y.sub.1,x.sub.2,y.sub.2,,,)the parameters of a choose junction with (x.sub.1,y.sub.1),(x.sub.2,y.sub.2) being the displacements, I is the image with I(x,y) being the pixel value at (x,y), ,R (which, for example, could take values of 1 or 1). Thus, each choose junction J.sub.kij is associated with its parameters .sub.kij. To get the output of a multi-level choosing procedure at the coordinates (x,y) of an image, C.sub.(I,x,y) is repeatedly evaluated starting from the initial choose junction, proceeding to main link if C.sub.(I,x,y)<, and to the auxiliary otherwise, until a choose junction that does not contain any links is reached. Then such choose junction stores an integer number H.sub.kij which is the output of a multi-step choosing procedure. For example, if there are N*.sub.ki choose junctions that do not contain any links among J.sub.ki, one may enumerate them from 0 to N*.sub.ki1, assigning each H.sub.kij with a respective number [0; N*.sub.ki1].
(30) .sup.Q.sup.
.sup.NMQ.sup.
(31)
(32) TABLE-US-00001 v = 0 offset = 0 for k = 0 to N*M1 { v = v + D.sub.S,k[offset + B.sub.k]; offset = offset + Q.sub.i; }
(33) Then v will contain the dot product <D.sub.S,k,E>.
(34) The displacements (x.sub.1,y.sub.1),(x.sub.2,y.sub.2) could undergo a coordinate transformation each time we call a multi-step choosing procedure for some image. Let us have a transformation F(a,b) that transforms (for example, by an affine transform, which may include rotation, scaling, shifting and skewing) a grid b so it became close to a (for example, in the least squares sense, minimizing |a.sub.ib.sub.i|.sup.2, or aligning the coordinates of the pupils of a and b instead). For example, the transformation could be represented as (d.sub.x,d.sub.y,s,) which are shift by the X and Y coordinates, scaling and rotation angle respectively.
(35) Then, for example, if the procedure is called on an image I with a grid f.sub.k, we can compute a transformation for the grid f.sub.k to the mean grid, F(Mean, f.sub.k), receiving the (d.sub.x,d.sub.y,s,) representation of the transformation, and then apply this transformation to the displacements before calculating the output of a multi-step choosing procedure.
(36)
(37)
(38)
(39) One of skill in the art will appreciate that to get a more precise result, one may run the method 500 several times. In particular, each time the method 500 is run the values within f.sub.initial may be displaced by small values (dependent on the size of the facial region), and then the final result can be an average or median of the results for each coordinate of each facial feature.
(40)
(41)
(42) There are multiple methods for smoothing 602 which can be used to accomplish the desired result. By way of example, one method is illustrated herein. Assume that there is a sequence of grids f.sup.t in time, f.sub.k.sup.tR.sup.2, k0, . . . , N1, t=T, T1, . . . , with t=T being the latest grid. That is, we store a history of the detections of facial element coordinates at some number of previous frames (one of skill in the art will appreciate that the number may be limited such that the frames being used are only the most recent relevant frames), with the latest frame being the frame to be smoothed. Let's define C(f).sup.2 that gives the center of a grid, averaging every f.sub.k coordinate, for example
(43)
One may want to exclude the upper eyelid from this averaging, since people usually blink from time to time, which causes the upper eyelid to move, which itself causes the center to go down and up on the blink. One may want additionally exclude the coordinates of the iris, since it also moves. For example, within the given configuration of facial elements, one may exclude the points 28, 35, 36, 32, 39, 40, 23, 26, 37, 38, 27, 31, 41, 42, 0, 1, 29, 30, 33, 34 of
(44) Then, to smooth 602 the coordinates of the facial elements, the coordinates of the facial center are subtracted from the grid on each frame, and a smoothing filter is applied. For example, a temporal Gaussian filter, or a bilateral Gaussian filter (both spatial and temporal), may be applied. After that, the coordinates of the center of the current frame are added. That is, S(f.sup.t)=S(f.sup.tC(f.sup.t))+C(f.sup.T),t=T,T1, . . . ,TM, where M is the amount of previous frames that we store, and S is the smoothing function that works separately at x and y coordinates:
(45)
where .sub.0, .sub.1 are the spatial and temporal smoothing coefficients, and .sub.0 could be proportional to some overall face size (for example, its interocular distance).
(46) Alternatively, one may also smooth the center coordinates first, to further reduce the oscillation, so the final result is:
S(f.sup.t)=S(f.sup.tS(C(f.sup.t)))+S(C(f.sup.T))
(47)
(48) Give the sets of facial element coordinates f.sub.k and its corresponding coordinates g.sub.k, |f|=|g|, one partitions these points to triangles such that each point is a vertex of some triangle, and no triangle contains points inside it. Usually it is best to do the partition by hand, since it needs to be done only once for a particular configuration of the detected facial elements, choosing the partition to give the most pleasurable effect. Alternatively, one may use automated methods like Delaunay triangulation or a greedy algorithm. As the result of such partition, one gets a set of triads (p,q,r) which define a triangle (f.sub.p,f.sub.q,f.sub.r) or (g.sub.p,g.sub.q,g.sub.r). Then one gets a triangle (f.sub.p,f.sub.q,f.sub.r) and its contents at the source image, and redraws the content of this triangle at the destination image at the coordinates (q.sub.p,q.sub.q,q.sub.r), transforming such content accordingly. There are standard procedures in modern 3D frameworks like OpenGL, OpenGL ES or Direct3D on mobile phones and desktops that allow performing that. Alternatively, one may code this procedure manually from geometric relations between the (f.sub.p,f.sub.q,f.sub.r) and (g.sub.p,g.sub.q,g.sub.r) coordinates, or use any other known method for triangle transformation.
(49) To further improve the aesthetics of the transformation, it may be advised to add more points to f.sub.k and g.sub.k. For example, one may add 4 points at the corners of the image and some additional points at the sides. One also may add 4 or more points around the face (with the coordinates of such points based on the size and the position of the face). Further, one may add more points in between facial features, by averaging the coordinates of some points, or shifting some distance at some angle from certain points, where such distance is proportional to the overall face size (for example, its interocular distance, or the distance between the eyes and the mouth), and the angle is related to the angle by which the face is currently rotated. For example one may add more points on the cheeks (by averaging the coordinates of the points 52, 50, 45, 23 in
(50)
(51)
(52)
(53)
(54)
(55)
(56)
(57)
(58)
(59) A set of possible parameters ={} for a choose junction is defined (where =(x.sub.1,y.sub.1,x.sub.2,y.sub.2,,,), as defined above, is the parameters of a choose junction). We could choose , at random, or set them to (1,1). could be chosen at random from some appropriate interval (for example, [1;1], if the pixel values of I are in the range [0;1]), or from the even partition of the [1; 1] range into a number of intervals (for example, 100 intervals). The displacement parameters (x.sub.1y.sub.1),(x.sub.2,y.sub.2) can be chosen, for example, at random from (V.sub.max,V.sub.max) which is an interval of some appropriate size. For example, the interval could be about 0.5 of the interocular distance of the face. We can also decrease this interval as S increases (starting at S=1 or later, for example, at S=3). We can also choose the displacement parameters to be evenly distributed across some particular grid covering the same interval. The number of possible parameters || could be of the order of 200000, but this number could be more or less than that.
(60) Then, partition the set of specimens into the main and auxiliary subsets by each :
.sub.main()={(j,q,k,v)|C.sub.(I.sub.j,G.sub.jqk)<}
.sub.auxiliary()=, .sub.main()
(61) Compute the value of giving the smallest standard deviation of v in both sets of specimens (here we denote v()={v|(j,q,k,v))}):
*=argmin.sub.(v(.sub.main)+v(.sub.auxiliary))
(62) This defines the corresponding choose junction having the parameters *. If the corresponding sum of standard deviations is sufficient, and the current count of choose junctions is below a certain maximum, then link the choose junction with its main and auxiliary choose junctions, and repeat the described calculation procedure for the main link (with the subset .sub.main(*)) and for the auxiliary link (with .sub.auxiliary(*)) until the mentioned condition no longer holds. This finishes the calculation procedure that yields the set of choose junctions J.sub.kij, their links and their parameters .sub.kij for any given facial element k.
(63)
(64)
(65)
The result is D.sub.S,k=(D.sub.S,k.sup.x,D.sub.S,k.sup.y) and D*.sub.S,k=(D*.sub.S,k.sup.x,D*.sub.S,k.sup.y) which could be (0, 0).
(66) When finding 718 the solution of the minimization problem, the minimization problem could be solved as a linear regression problem or with methods like support vector machines or neural networks. When solving it as a linear regression problem, one may need to add a regularization term to the minimized function. Such term could be calculated as 2.sup.zN|E|, where one could find an optimal value of z by trying different real numbers from some set and stopping at number which gives the best accuracy. Alternatively, one may assign z with a fixed value like 3.6. One may solve the linear regression problem with a gradient descent method or calculate the closed-form solution.
(67)
(68)
(69)
(70)
(71) One of skill in the art will appreciate that the invention may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, mobile phones, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination of hardwired or wireless links) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
(72) With reference to
(73) The computer 820 may also include a magnetic hard disk drive 827 for reading from and writing to a magnetic hard disk 839, a magnetic disk drive 828 for reading from or writing to a removable magnetic disk 829, and an optical disc drive 830 for reading from or writing to removable optical disc 831 such as a CD-ROM or other optical media. The magnetic hard disk drive 827, magnetic disk drive 828, and optical disc drive 830 are connected to the system bus 823 by a hard disk drive interface 832, a magnetic disk drive-interface 833, and an optical drive interface 834, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-executable instructions, data structures, program modules and other data for the computer 820. Although the exemplary environment described herein employs a magnetic hard disk 839, a removable magnetic disk 829 and a removable optical disc 831, other types of computer readable media for storing data can be used, including magnetic cassettes, flash memory cards, digital versatile discs, Bernoulli cartridges, RAMs, ROMs, and the like.
(74) Program code means comprising one or more program modules may be stored on the hard disk 839, magnetic disk 829, optical disc 831, ROM 824 or RAM 825, including an operating system 835, one or more application programs 836, other program modules 837, and program data 838. A user may enter commands and information into the computer 820 through keyboard 840, pointing device 842, or other input devices (not shown), such as a microphone, joy stick, game pad, satellite dish, scanner, motion detectors or the like. These and other input devices are often connected to the processing unit 821 through a serial port interface 846 coupled to system bus 823. Alternatively, the input devices may be connected by other interfaces, such as a parallel port, a game port or a universal serial bus (USB). A monitor 847 or another display device is also connected to system bus 823 via an interface, such as video adapter 848. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
(75) The computer 820 may operate in a networked environment using logical connections to one or more remote computers, such as remote computers 849a and 849b. Remote computers 849a and 849b may each be another personal computer, a server, a router, a network PC, a peer device or other common network node, and typically include many or all of the elements described above relative to the computer 820, although only memory storage devices 850a and 850b and their associated application programs 836a and 836b have been illustrated in
(76) When used in a LAN networking environment, the computer 820 can be connected to the local network 851 through a network interface or adapter 853. When used in a WAN networking environment, the computer 820 may include a modem 854, a wireless link, or other means for establishing communications over the wide area network 852, such as the Internet. The modem 854, which may be internal or external, is connected to the system bus 823 via the serial port interface 846. In a networked environment, program modules depicted relative to the computer 820, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing communications over wide area network 852 may be used.
(77) The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.