Patent classifications
G10L2021/105
Three-dimensional face animation from speech
A method for training a three-dimensional model face animation model from speech, is provided. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject. A non-transitory, computer-readable medium storing instructions to cause a system to perform the above method, and the system, are also provided.
TWO-STAGE FRAMEWORK FOR ZERO-SHOT IDENTITY-AGNOSTIC TALKING-HEAD GENERATION
Methods, systems, apparatuses, devices, and computer program products are described. A system may input a first audio stream (e.g., audio recording) and a corresponding text sting into a machine learning model. The first audio stream and the text string may correspond to a first identity (e.g., person). Based on an output of the machine learning model, the system may generate a second audio stream associated with a second identity and mimics the first audio steam. For example, the second audio stream may be a generated recording of the second identity speaking the first text string. In addition, the system may generate a video depicting the second identity speaking the first text string (e.g., the second audio stream) based on combining the second audio stream with some image or previous video of the second identity. For example, the system may generate the video based on generating a head motion sequence.
Joint audio-video facial animation system
The present invention relates to a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.
APPARATUS AND METHOD FOR GENERATING SPEECH SYNTHESIS IMAGE
An apparatus for generating a speech synthesis image according to a disclosed embodiment is an apparatus for generating a speech synthesis image based on machine learning, the apparatus including a first global geometric transformation predictor configured to be trained to receive each of a source image and a target image including the same person, and predict a global geometric transformation for a global motion of the person between the source image and the target image based on the source image and the target image, a local feature tensor predictor configured to be trained to predict a feature tensor for a local motion of the person based on preset input data, and an image generator configured to be trained to reconstruct the target image based on the global geometric transformation, the source image, and the feature tensor for the local motion.
MOUTH SHAPE-BASED METHOD AND APPARATUS FOR GENERATING FACE IMAGE, METHOD AND APPARATUS FOR TRAINING MODEL, AND STORAGE MEDIUM
The present disclosure provides a mouth shape-based method for generating a face image, a method for training a model, and a device, which relates to the field of artificial intelligence, in particular to the field of cloud computing and digital human. The specific implementation solution is as follows: acquiring audio data to be recognized and a preset face image; determining an audio feature of the audio data to be recognized; where the audio feature includes a speech speed feature and a semantic feature; and performing, according to the speech speed feature and the semantic feature, processing on the preset face image, to generate a face image having a mouth shape.
Apparatus and method for generating lip sync image
An apparatus for generating a lip sync image according to a disclosed embodiment has one or more processors and a memory which stores one or more programs executed by the one or more processors. The apparatus includes a first artificial neural network model configured to generate an utterance synthesis image by using a person background image and an utterance audio signal corresponding to the person background image as an input, and generate a silence synthesis image by using only the person background image as an input, and a second artificial neural network model configured to output, from a preset utterance maintenance image and the first artificial neural network model, classification values for the preset utterance maintenance image and the silence synthesis image by using the silence synthesis image as an input.
Learning method for generating lip sync image based on machine learning and lip sync image generation device for performing same
A lip sync image generation device based on machine learning according to a disclosed embodiment includes an image synthesis model, which is an artificial neural network model, and which uses a person background image and an utterance audio signal as an input to generate a lip sync image, and a lip sync discrimination model, which is an artificial neural network model, and which discriminates the degree of match between the lip sync image generated by the image synthesis model and the utterance audio signal input to the image synthesis model.
PHYSICAL-VIRTUAL PATIENT BED SYSTEM
A patient simulation system for healthcare training is provided. The system includes one or more interchangeable shells comprising a physical anatomical model of at least a portion of a patient's body, the shell adapted to be illuminated from behind to provide one or more dynamic images viewable on the outer surface of the shells; a support system adapted to receive the shells via a mounting system, wherein the system comprises one or more image units adapted to render the one or more dynamic images viewable on the outer surface of the shells; one or more interface devices located about the patient shells to receive input and provide output; and one or more computing units in communication with the image units and interface devices, the computing units adapted to provide an interactive simulation for healthcare training.
Virtual Photorealistic Digital Actor System for Remote Service of Customers
A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications.
Virtual Photorealistic Digital Actor System for Remote Service of Customers
A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications.