G06T13/20

METHOD AND APPARATUS FOR PROVIDING INTERACTIVE AVATAR SERVICES
20230230303 · 2023-07-20 · ·

A method of providing an avatar service includes obtaining a user-uttered voice and a spatial information of a user-utterance space, transmitting the user-uttered voice and the spatial information to a server, receiving, from the server, a first avatar voice answer and an avatar facial expression sequence corresponding to the first avatar voice, which are determined based on the user-uttered voice and the spatial information, determining first avatar facial expression data, based on the first avatar voice answer and the avatar facial expression sequence, identifying a certain event during reproduction of a first avatar animation created based on the first avatar voice answer and the first avatar facial expression data, determining second avatar facial expression data or a second avatar voice answer, based on the certain event, and reproducing a second avatar animation created based on the second avatar facial expression data or the second avatar voice answer.

Generating facial position data based on audio data

A computer-implemented method for generating a machine-learned model to generate facial position data based on audio data comprising training a conditional variational autoencoder having an encoder and decoder. The training comprises receiving a set of training data items, each training data item comprising a facial position descriptor and an audio descriptor; processing one or more of the training data items using the encoder to obtain distribution parameters; sampling a latent vector from a latent space distribution based on the distribution parameters; processing the latent vector and the audio descriptor using the decoder to obtain a facial position output; calculating a loss value based at least in part on a comparison of the facial position output and the facial position descriptor of at least one of the one or more training data items; and updating parameters of the conditional variational autoencoder based at least in part on the calculated loss value.

Method and apparatus for controlling avatars based on sound
11562520 · 2023-01-24 · ·

Provided is a method for controlling avatar motion, which is operated in a user terminal and includes receiving an input audio by an audio sensor, and controlling, by one and more processors, a motion of a first user avatar based on the input audio.

AUGMENTED REALITY ANAMORPHOSIS SYSTEM
20230222743 · 2023-07-13 ·

Systems, methods, devices, and media for anamorphosis systems to generate and cause display of anamorphic media are disclosed. In one embodiment, an anamorphosis system is configured to identify a set of features of a space, determine relative positions of the set of features, determine a perspective of the mobile device within the space based on the relative positions of the set of features, retrieve anamorphic media based on the location of the mobile device, and apply the anamorphic media to a presentation of the space at the mobile device. The anamorphic media may include media items such as images and videos, configured such that the media items are only visible from one or more specified perspectives. The anamorphic media may include a stylized text string projected onto surfaces of a space such that the stylized text string is correctly displayed when viewed through a user device from a specified perspective.

AUGMENTED REALITY ANAMORPHOSIS SYSTEM
20230222743 · 2023-07-13 ·

Systems, methods, devices, and media for anamorphosis systems to generate and cause display of anamorphic media are disclosed. In one embodiment, an anamorphosis system is configured to identify a set of features of a space, determine relative positions of the set of features, determine a perspective of the mobile device within the space based on the relative positions of the set of features, retrieve anamorphic media based on the location of the mobile device, and apply the anamorphic media to a presentation of the space at the mobile device. The anamorphic media may include media items such as images and videos, configured such that the media items are only visible from one or more specified perspectives. The anamorphic media may include a stylized text string projected onto surfaces of a space such that the stylized text string is correctly displayed when viewed through a user device from a specified perspective.

NEURAL NETWORK FOR AUDIO AND VIDEO DUBBING WITH 3D FACIAL MODELLING
20230015971 · 2023-01-19 ·

A computer-implemented method includes obtaining source video data comprising a plurality of image frames, and using a face tracker to detect one or more instances of faces within respective sequences of image frames of the source video data. For a first instance of a given face detected within a first sequence of image frames, the method includes determining a framewise location and size of the first instance of the given face in the first sequence of image frames, using a neural renderer to obtain replacement video data comprising a replacement instance of the given face, and using the determined framewise location and size to replace at least part of the first instance of the given face with at least part of the replacement instance of the given face.

Systems and methods for rigging a point cloud for animation
11557074 · 2023-01-17 · ·

Disclosed is a rigging system for animating the detached and non-uniformly distributed data points of a point cloud. In response to a selection of a region of space in which a first set of data points are located, the system may identify commonality in the positional or non-positional elements of a first subset of the first set of data points, and may determine that a second subset of the first set of data points lack the commonality. The system may refine the first set of data points to a second set of data points that includes the first subset of data points and that excludes the second subset of data points. The system may link the second set of data points to a bone of a skeletal framework, and may animate the second set of data points based on an animation that is defined for the bone.

Systems and methods for rigging a point cloud for animation
11557074 · 2023-01-17 · ·

Disclosed is a rigging system for animating the detached and non-uniformly distributed data points of a point cloud. In response to a selection of a region of space in which a first set of data points are located, the system may identify commonality in the positional or non-positional elements of a first subset of the first set of data points, and may determine that a second subset of the first set of data points lack the commonality. The system may refine the first set of data points to a second set of data points that includes the first subset of data points and that excludes the second subset of data points. The system may link the second set of data points to a bone of a skeletal framework, and may animate the second set of data points based on an animation that is defined for the bone.

Preprocessor System for Natural Language Avatars

A preprocessor for use with natural language processors for control of computerized avatars provides for an embedding of avatar control information in a speech response file of the natural language processor providing avatars with improved perception of emotional intelligence. Rapid avatar response is provided by independent end of speech detection and a response cache bypassing text-to-speech conversion times. The preprocessor may be shared among multiple websites to provide a shared analysis of query optimization.

Audio-visual navigation and communication dynamic memory architectures
11698709 · 2023-07-11 · ·

According to one embodiment, a plurality of spatial publishing objects (SPOs) is provided in a multidimensional space in a user interface. Each of the plurality of spatial publishing objects is associated with digital media data from at least one digital media source. The user interface has a field for the digital media data. A user is provided via the user interface with a user presence that is optionally capable of being represented in the user interface relative to the plurality of spatial publishing objects. The digital media data associated with the at least one spatial publishing object are combined to generate a media output corresponding to the combined digital media data.