G06T13/205

METHOD AND APPARATUS FOR DISPLAYING LYRIC EFFECTS, ELECTRONIC DEVICE, AND COMPUTER READABLE MEDIUM
20220351454 · 2022-11-03 ·

The present disclosure provides a method and an apparatus for displaying lyric effects, an electronic device, and a computer-readable medium. The method includes: obtaining, based on a lyric effect display operation of a user, an image sequence and music data to be displayed, the music data including audio data and lyrics; determining a target time point, playing at least one target image corresponding to the target time point in the image sequence, and determining target lyrics corresponding to the target time point in the lyrics, and adding animation effects on the at least one target image, displaying the target lyrics on the at least one target image, and playing a part of the audio data corresponding to the target lyrics.

ANIMATION PRODUCTION SYSTEM

To enable to shoot animations in a virtual space, the principal invention for solving the above-described problem is an animation production method that provides a virtual space in which a given object is placed, the method comprising: detecting an operation of a user equipped with a head mounted display; controlling an action of an object based on the detected operation of the user; shooting the action of the object; storing action data relating to the shot action of the object in a first track; storing voice of the user in a second track; and playing back the video stored in the first track in synchronization with the voice stored in the second track

ANIMATION PRODUCTION SYSTEM

To enable to shoot animations in a virtual space the principal invention for solving the above-described problem is an animation production method that provides a virtual space in which a given object is placed, the method comprising: storing voice of a user in a first track; detecting an operation of the user equipped with a head mounted display; controlling an action of the object based on the detected operation of the user; shooting the action of the object; storing action data relating to the shot action of the object in a second track; storing voice of the user in the second track; and shooting the action of the object while playing the voice stored in the first track.

LEARNING DEVICE AND METHOD FOR GENERATING IMAGE
20220351348 · 2022-11-03 ·

A learning device for generating an image according to an embodiment disclosed is a computing device including one or more processors and a memory storing one or more programs executed by the one or more processors. The learning device includes a first machine learning model that generates a mask for masking a portion related to speech in a person basic image with the person basic image as an input, and generates a person background image by synthesizing the person basic image and the mask.

AUDIO REACTIVE AUGMENTED REALITY

Methods, systems, and storage media for augmenting a video are disclosed. Exemplary implementations may: receive a selection of an effect; receive user-generated content comprising video data and audio data; detect a characteristic of the audio data comprising at least a volume and/or a pitch of the audio data during a period of time; determine a series of numeric values based on the characteristic of the audio data during the period of time, individual numeric values of the series of numeric values being correlated with an amplitude of the volume and/or pitch at a discrete point within the period of time; and augment at least one of the video data and/or the audio data to include the effect based on the series of numeric values at discrete points in time within the period of time.

RESPONSIVE VIDEO CONTENT ALTERATION

A first user input is detected from a client device. The first user input is directed at a video content that includes a set of one or more topics, the first user input is from a viewer of the video content. A set of one or more frames in the video content is analyzed based on the first user input. A first topic in the video content is identified based on the set of frames and based on the viewer. The video content, related to the first topic of the set of topics, is altered based on the set of frames and based on the viewer.

Joint audio-video facial animation system
11610354 · 2023-03-21 · ·

The present invention relates to a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.

SENTIMENT-BASED INTERACTIVE AVATAR SYSTEM FOR SIGN LANGUAGE
20220343576 · 2022-10-27 ·

Systems and methods for doing presenting an avatar that speaks sign language based on sentiment of a speaker is disclosed herein. A translation application running on a device receives a content item comprising a video and an audio, wherein the audio comprises a first plurality of spoken words in a first language. The video comprises a character speaking the first plurality of spoken words in the first language. The translation application translates the first plurality of spoken words of the first language into a first sign of a first sign language. The translation application determines an emotional state expressed by the character based on sentiment analysis. The translation application generates an avatar that speaks the first sign of the first sign language where the avatar exhibits the determined emotional state. The content item and the avatar are presented for display on the device.

Artificial intelligence-based animation character drive method and related apparatus

This application disclose an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

LIGHT-WEIGHT MACHINE LEARNING MODELS FOR LIP SYNC ANIMATION ON MOBILE DEVICES OR OTHER DEVICES
20230130287 · 2023-04-27 ·

A method includes obtaining a speech segment. The method also includes generating, using at least one processing device of an electronic device, context-independent features and context-dependent features of the speech segment. The method further includes decoding, using the at least one processing device of the electronic device, a first viseme based on the context-independent features. The method also includes decoding, using the at least one processing device of the electronic device, a second viseme based on the context-dependent features and the first viseme. In addition, the method includes generating, using the at least one processing device of the electronic device, an output viseme based on the first and second visemes, where the output viseme is associated with a visual animation of the speech segment.