G10L2021/105

Systems and methods for generating composite media using distributed networks
10963697 · 2021-03-30 ·

A distributed systems and methods for generating composite media including receiving a media context that defines media that is to be generated, the media context including: a definition of a sequence of media segment specifications and, an identification of a set of remote devices. For each media segment specification, a reference segment may be generated and transmitted to at least one remote device. A media segment may be received from each of the remote device, the media segment having been recorded by a camera. Verified media sequences may replace the corresponding reference segment. The media segments may be aggregated and an updated sequence of media segments may be defined. An instance of the media context that includes a subset of the updated sequence of media segments may then be generated.

System and Method for Talking Avatar
20210082452 · 2021-03-18 ·

Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.

METHODS AND SYSTEMS FOR IMAGE AND VOICE PROCESSING

Systems and methods are disclosed configured to train an autoencoder using images that include faces, wherein the autoencoder comprises an input layer, an encoder configured to output a latent image from a corresponding input image, and a decoder configured to attempt to reconstruct the input image from the latent image. An image sequence of a face exhibiting a plurality of facial expressions and transitions between facial expressions is generated and accessed. Images of the plurality of facial expressions and transitions between facial expressions are captured from a plurality of different angles and using different lighting. An autoencoder is trained using source images that include the face with different facial expressions captured at different angles with different lighting, and using destination images that include a destination face. The trained autoencoder is used to generate an output where the likeness of the face in the destination images is swapped with the likeness of the source face, while preserving expressions of the destination face.

Methods and systems for image and voice processing

Systems and methods are disclosed configured to train an autoencoder using images that include faces, wherein the autoencoder comprises an input layer, an encoder configured to output a latent image from a corresponding input image, and a decoder configured to attempt to reconstruct the input image from the latent image. An image sequence of a face exhibiting a plurality of facial expressions and transitions between facial expressions is generated and accessed. Images of the plurality of facial expressions and transitions between facial expressions are captured from a plurality of different angles and using different lighting. An autoencoder is trained using source images that include the face with different facial expressions captured at different angles with different lighting, and using destination images that include a destination face. The trained autoencoder is used to generate an output where the likeness of the face in the destination images is swapped with the likeness of the source face, while preserving expressions of the destination face.

THREE-DIMENSIONAL FACE ANIMATION FROM SPEECH

A method for training a three-dimensional model face animation model from speech, is provided. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject. A non-transitory, computer-readable medium storing instructions to cause a system to perform the above method, and the system, are also provided.

SYSTEMS, METHODS, DEVICES AND APPARATUSES FOR DETECTING FACIAL EXPRESSION

A system, method and apparatus for detecting facial expressions according to EMG signals.

METHOD AND APPARATUS FOR CONTROLLING MOUTH SHAPE CHANGES OF THREE-DIMENSIONAL VIRTUAL PORTRAIT
20200410731 · 2020-12-31 ·

Embodiments of the present disclosure relate to a method and apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, relating to the field of cloud computing. The method may include: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence.

System, method, and computer program for transmitting face models based on face data points

A system, method, and computer program are provided for receiving face models based on face data points. In use, a real-time face model is received, wherein the real-time face model includes one or more face data points. Real-time face data points are received, including additional one or more face data points. The real-time face model is manipulated based on the real-time face data points.

FACIAL RECOGNITION METHOD FOR VIDEO CONFERENCE AND SERVER USING THE METHOD
20200364918 · 2020-11-19 ·

A facial recognition method for video conferencing requiring a reduced bandwidth and transmitting video and audio frames synchronously first determines whether a 3D body model of a first user at a local end has been currently retrieved or is otherwise retrievable from a historical database. Multiple audio frames of first user are collected and audio frequency at a specific range are filtered out. An envelope curve of the first audio frames and multiple attacking time periods and multiple releasing time periods of the envelope curve is calculated and correlated with lip movements of first user. Information packets of same and head-rotating and limb-swinging images of the first user are transmitted to a remote second user so that the 3D body model can simulate and show lip shapes and other movement of the first user.

System and method for animated lip synchronization

A system and method for animated lip synchronization. The method includes: capturing speech input; parsing the speech input into phenomes; aligning the phonemes to the corresponding portions of the speech input; mapping the phonemes to visemes; synchronizing the visemes into viseme action units, the viseme action units comprising jaw and lip contributions for each of the phonemes; and outputting the viseme action units.