G06T13/205

Animation effect attachment based on audio characteristics
11521341 · 2022-12-06 · ·

Systems and methods for rendering a video effect to a display are described. More specifically, video data and audio data are obtained. The video data is analyzed to determine one or more attachment points of a target object that appears in the video data. The audio data is analyzed to determine audio characteristics. A video effect associated with an animation to be added to the one or more attachment points is determined based on the audio characteristics. A rendered video is generated by applying the video effect to the video data.

MODIFICATION OF OBJECTS IN FILM

A computer-implemented method of processing video data comprising a sequence of image frames. The method includes isolating an instance of an object within the sequence of image frames, generating a modified instance of the object using a machine learning model, and modifying the video data to smoothly transition between at least part of the isolated instance of the object and a corresponding at least part of the modified instance of the object over a subsequence of the sequence of image frames.

DEVICE AND METHOD FOR GENERATING SPEECH VIDEO
20220375190 · 2022-11-24 ·

A speech video generation device according to an embodiment includes a first encoder that receives an input of a first person background image of a predetermined person partially hidden by a first mask, and extracts a first image feature vector from the first person background image, a second encoder, which receives an input of a second person background image of the person partially hidden by a second mask, and extracts a second image feature vector from the second person background image, a third encoder, which receives an input of a speech audio signal of the person, and extracts a voice feature vector from the speech audio signal, a combining unit, which generates a combined vector of the first image feature vector, the second image feature vector, and the voice feature vector, and a decoder, which reconstructs a speech video of the person using the combined vector as an input.

Display control device, communication device, display control method, and recording medium
11508106 · 2022-11-22 · ·

An disclosure includes: moving image acquisition unit configured to acquire moving image data obtained through moving image capturing of at least a mouth part of an utterer; a lip detection unit configured to detect a lip part from the moving image data and detect motion of the lip part; a moving image processing unit configured to generate a moving image enhanced to increase the motion of the lip part detected by the lip detection unit; and a display control unit configured to control a display panel to display the moving image generated by the moving image processing unit.

Augmented Reality Platform for Fan Engagement
20230057073 · 2023-02-23 ·

An augmented reality platform that enables fan engagement and new media opportunities is disclosed. The invention is comprised of a mobile device software application (app) leveraging augmented reality (AR), artificial intelligence (AI) and a cloud network. Users select on-screen character avatars and dance songs from pre-defined categories within the app and then begin dancing and recording themselves with the app. The app transfers user movements and their local environment to the app. AI algorithms sync user movements to the avatar along with their background (using AR) and users can see themselves dancing in character on their mobile device. Users can share avatar dances with friends and compete with each other in dance contests in real time with virtual currency rewards. Advertisers can also participate in the app and share advertisements, logos as well as promote merchandise to the users.

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO ON BASIS OF MACHINE LEARNING
20220358703 · 2022-11-10 ·

A device for generating a speech video may include a first encoder to receive a person background image corresponding to a video part of a speech video of a person and extract an image feature vector from the person background image, a second encoder to receive a speech audio signal corresponding to an audio part of the speech video and extract a voice feature vector from the speech audio signal, a combiner to generate a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder, and a decoder to reconstruct the speech video of the person using the combined vector as an input. The person background image input to the first encoder includes a face and an upper body of the person, with a portion related to speech of the person covered with a mask.

CONTENT CREATION BASED ON RHYTHM
20220351752 · 2022-11-03 ·

The present disclosure describes techniques for generating content based on rhythm. The techniques comprises acquiring a plurality of images comprising an object with movements in the plurality of images; determining whether at least one portion of the at least one part of the object in a first image aligns with a target image overlaid on the first image and whether the at least one portion aligns with the target image at a time proximate to a first rhythmic point of a playback of a selected piece of music; segmenting the at least one part of the object from the first image in response to determining that the at least one portion aligns with the target image at the time proximate to the first rhythmic point of the playback of the selected piece of music; and generating a first overlay based on the at least one part of the object.

SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Apparatuses, systems, and techniques are presented to generate media content. In at least one embodiment, a first neural network is used to generate first video information based, at least in part, upon voice information corresponding to one or more users, and a second neural network is used to generate second video information corresponding to the one or more users based, at least in part, upon the first video information and one or more images corresponding to the one or more users

Systems and methods for automated real-time generation of an interactive attuned discrete avatar

Systems and methods enabling rendering an avatar attuned to a user. The systems and methods include receiving audio-visual data of user communications of a user. Using the audio-visual data, the systems and methods may determine vocal characteristics of the user, facial action units representative of facial features of the user, and speech of the user based on a speech recognition model and/or natural language understanding model. Based on the vocal characteristics, an acoustic emotion metric can be determined. Based on the speech recognition data, a speech emotion metric may be determined. Based on the facial action units, a facial emotion metric may be determined. An emotional complex signature may be determined to represent an emotional state of the user for rendering the avatar attuned to the emotional state based on a combination of the acoustic emotion metric, the speech emotion metric and the facial emotion metric.

METHOD AND DEVICE FOR GENERATING SPEECH VIDEO BY USING TEXT
20220351439 · 2022-11-03 ·

A device for generating a speech video according to an embodiment has one or more processor and a memory storing one or more programs executable by the one or more processors, and the device includes a video part generator configured to receive a person background image of a person and generate a video part of a speech video of the person; and an audio part generator configured to receive text, generate an audio part of the speech video of the person, and provide speech-related information occurring during the generation of the audio part to the video part generator.