G06T13/205

VOICE NOTE WITH FACE TRACKING
20230127090 · 2023-04-27 ·

Methods and systems are disclosed for performing operations for generating a voice note. The operations include receiving, by a messaging application, a request from a first participant to send a voice message to a second participant in a communication session. The operations include, in response to receiving the request, generating an audio file comprising a specified duration of speech input received from the first participant. The operations include associating the audio file with an avatar that represents the first participant. The operations include presenting an interactive visual indicator of the avatar among a plurality of messages in the communication session. The operations include receiving, by the messaging application, input that selects the interactive visual indicator of the avatar. The operations include, in response to receiving the input, rendering an animation of the avatar speaking the speech input while playing the audio file.

MATCHING MOUTH SHAPE AND MOVEMENT IN DIGITAL VIDEO TO ALTERNATIVE AUDIO
20230121540 · 2023-04-20 ·

A method for matching mouth shape and movement in digital video to alternative audio includes deriving a sequence of facial poses including mouth shapes for an actor from a source digital video. Each pose in the sequence of facial poses corresponds to a middle position of each audio sample. The method further includes generating an animated face mesh based on the sequence of facial poses and the source digital video, transferring tracked expressions from the animated face mesh or the target video to the source video, and generating a rough output video that includes transfers of the tracked expressions. The method further includes generating a finished video at least in part by refining the rough video using a parametric autoencoder trained on mouth shapes in the animated face mesh or the target video. One or more computers may perform the operations of the method.

ARTIFICIAL INTELLIGENCE-BASED ANIMATION CHARACTER DRIVE METHOD AND RELATED APPARATUS

This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

Generating Facial Position Data Based on Audio Data

A computer-implemented method for generating a machine-learned model to generate facial position data based on audio data comprising training a conditional variational autoencoder having an encoder and decoder. The training comprises receiving a set of training data items, each training data item comprising a facial position descriptor and an audio descriptor; processing one or more of the training data items using the encoder to obtain distribution parameters; sampling a latent vector from a latent space distribution based on the distribution parameters; processing the latent vector and the audio descriptor using the decoder to obtain a facial position output; calculating a loss value based at least in part on a comparison of the facial position output and the facial position descriptor of at least one of the one or more training data items; and updating parameters of the conditional variational autoencoder based at least in part on the calculated loss value.

IMAGE PROCESSING METHOD AND APPARATUS FOR AUGMENTED REALITY, ELECTRONIC DEVICE, AND STORAGE MEDIUM

An image processing method and apparatus for augmented reality, an electronic device and a storage medium, including: acquiring a target image in response to an image acquiring instruction triggered by a user, where the target image includes a target object; acquiring an augmented reality model of the target object, and outputting the augmented reality model in combination with the target object; acquiring target audio data selected by the user, and determining an audio feature with temporal regularity according to the target audio data; and driving the augmented reality model according to the audio feature and a playing progress of the target audio data when outputting the target audio data.

Dance Animation Processing Method and Apparatus, Electronic Device, and Storage Medium
20230162421 · 2023-05-25 ·

The present disclosure provides a dance animation processing method and apparatus, an electronic device, and a storage medium. The method includes: acquiring multiple dance action segments, and establishing an animation state transition relationship for the multiple dance action segments, each action node in the animation state transition relationship corresponding to one dance action segment, and a transition cost existing among the action nodes; acquiring a target audio file, and determining a music feature sequence for the target audio file; determining a dance action sequence for the music feature sequence according to the transition cost in the animation state transition relationship; and generating a dance animation for the target audio file according to the dance action sequence.

SYNTHETIC EMOTION IN CONTINUOUSLY GENERATED VOICE-TO-VIDEO SYSTEM
20230061761 · 2023-03-02 ·

One example method includes collecting an audio segment that includes audio data generated by a user, analyzing the audio data to identify an emotion expressed by the user, computing start and end indices of a video segment, selecting video data that shows the emotion expressed by the user, using the video data and the start and end indices of the video segment to modify a face of the user as the face appears in the video segment so as to generate modified face frames, and stitching the modified face frames into the video segment to create a modified video segment with the emotion expressed by the user, and the modified video segment includes the audio data generated by the user.

INTERACTIVE FASHION WITH MUSIC AR

Methods and systems are disclosed for performing operations comprising: receiving a monocular image that includes a depiction of a person wearing an article of clothing; generating a segmentation of the article of clothing worn by the person in the monocular image; obtaining one or more audio-track related augmented reality elements; and applying the one or more audio-track related augmented reality elements to the article of clothing worn by the person based on the segmentation of the article of clothing worn by the person.

LEAD CONVERSION USING CONVERSATIONAL VIRTUAL AVATAR

A system and method for lead conversion using conversational virtual avatar is disclosed. System comprising processor causes Conversation Virtual Avatar Platform (CVAP) to receive, for first entity, from lead prioritization engine, leads applicable to first entity via lead repository based on scores associated with respective leads. Processor causes CVAP to receive, through conversation management engine (CME) configured in CVAP, from leads, responses to questions pertaining to product attributes and information pertaining to lead. The processor causes CVAP to process responses to determine action and/or state, which includes whether to issue additional product-attribute based questions through Virtual Avatar (VA) using Response To Motion Module (RTME) or to, through recommender engine that uses recommendation model, in real-time, recommend products associated with first entity to respective at least one lead based on any or combination of responses received from lead, information pertaining to lead, and products ordered by entities similar to lead.

Spatial audio and avatar control at headset using audio signals

An audio system in a local area providing an audio signal to a headset of a remote user is presented herein. The audio system identifies sounds from a human sound source in the local area, based in part on sounds detected within the local area. The audio system generates an audio signal for presentation to a remote user within a virtual representation of the local area based in part on a location of the remote user within the virtual representation of the local area relative to a virtual representation of the human sound source within the virtual representation of the local area. The audio system provides the audio signal to a headset of the remote user, wherein the headset presents the audio signal as part of the virtual representation of the local area to the remote user.