G10L2021/105

AVATAR ANIMATION WITH GENERAL PRETRAINED FACIAL MOVEMENT ENCODING
20250095259 · 2025-03-20 ·

Techniques and systems are provided for generating a representation of a face. For instance, a process can include obtaining one or more images of a face. The process can further include generating an encoded expression representing an expression of the face, wherein predetermined characteristics of the face remain constant relative to the encoded expression. The process can further include mapping the encoded expression to a corresponding expression of a facial model. The process can further include generating the representation of the facial model based on the encoded expression.

SYSTEMS, METHODS, DEVICES AND APPARATUSES FOR DETECTING FACIAL EXPRESSION

A system, method and apparatus for detecting facial expressions according to EMG signals.

JOINT AUDIO-VIDEO FACIAL ANIMATION SYSTEM
20250086868 · 2025-03-13 ·

The present invention relates to a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.

Viseme Prediction
20250086869 · 2025-03-13 ·

Various embodiments of an apparatus, method(s), system(s) and computer program product(s) described herein are directed to a Viseme Engine. The Viseme Engine receives audio data associated with a user account. The Viseme Engine predicts at least one viseme that corresponds with a portion of phoneme audio data and identifies one or more facial expression parameters associated with the predicted viseme. The facial expression parameters being applicable to a face model. The Viseme Engine renders the predicted viseme according to the one or more facial expression parameters.

Switching character facial expression associated with speech
12246256 · 2025-03-11 · ·

The present invention is to, on a basis of data including information capable of specifying a motion input by a performer who plays a character arranged in a virtual space and a speech of the performer, produce a behavior of the character, to enable the character viewed from a predetermined point of view in the virtual space to display on a predetermined display unit, and to enable a sound depending on the speech of the performer to output. A behavior of a mouth of the character in the virtual space is controlled based on the data, a next facial expression to be taken as the facial expression of the character is specified in accordance with a predetermined rule, and switching control is performed on the facial expression of the character in the virtual space to the facial expression, which is specified by the step of specifying, with lapse of time.

System and method for triphone-based unit selection for visual speech synthesis

A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.

Information Processing Method and Information Processing Device
20170053642 · 2017-02-23 ·

An information processing method includes receiving a change instruction to change a voice parameter used in synthesizing a voice for a set of texts, changing the voice parameter in accordance with the change instruction to change the voice parameter, changing, in accordance with the change instruction, an image parameter used in synthesizing an image of a virtual object, the virtual object indicating a character that vocalizes the voice that has been synthesized, synthesizing the voice using the changed voice parameter, and synthesizing the image using the changed image parameter.

AVATAR FACIAL EXPRESSION AND/OR SPEECH DRIVEN ANIMATIONS
20170039750 · 2017-02-09 ·

Apparatuses, methods and storage medium associated with animating and rendering an avatar are disclosed herein. In embodiments, an apparatus may include a facial expression and speech tracker to respectively receive a plurality of image frames and audio of a user, and analyze the image frames and the audio to determine and track facial expressions and speech of the user. The tracker may further select a plurality of blend shapes, including assignment of weights of the blend shapes, for animating the avatar, based on tracked facial expressions or speech of the user. The tracker may select the plurality of blend shapes, including assignment of weights of the blend shapes, based on the tracked speech of the user, when visual conditions for tracking facial expressions of the user are determined to be below a quality threshold. Other embodiments may be disclosed and/or claimed.

Generating a Visually Consistent Alternative Audio for Redubbing Visual Speech

There are provided systems and methods for generating a visually consistent alternative audio for redubbing visual speech using a processor configured to sample a dynamic viseme sequence corresponding to a given utterance by a speaker in a video, identify a plurality of phonemes corresponding to the dynamic viseme sequence, construct a graph of the plurality of phonemes that synchronize with a sequence of lip movements of a mouth of the speaker in the dynamic viseme sequence, use the graph to generate an alternative phrase that substantially matches the sequence of lip movements of the mouth of the speaker in the video.

VIRTUAL PHOTOREALISTIC DIGITAL ACTOR SYSTEM FOR REMOTE SERVICE OF CUSTOMERS
20170032377 · 2017-02-02 ·

A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications.