G06T13/20

Audio-visual navigation and communication dynamic memory architectures
11698709 · 2023-07-11 · ·

According to one embodiment, a plurality of spatial publishing objects (SPOs) is provided in a multidimensional space in a user interface. Each of the plurality of spatial publishing objects is associated with digital media data from at least one digital media source. The user interface has a field for the digital media data. A user is provided via the user interface with a user presence that is optionally capable of being represented in the user interface relative to the plurality of spatial publishing objects. The digital media data associated with the at least one spatial publishing object are combined to generate a media output corresponding to the combined digital media data.

METHOD FOR OUTPUTTING BLEND SHAPE VALUE, STORAGE MEDIUM, AND ELECTRONIC DEVICE

A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t−n)/2 time point based on an input feature vector of a previous layer between a t time point and a t-n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.

SYSTEMS AND METHODS FOR AUTOMATED REAL-TIME GENERATION OF AN INTERACTIVE AVATAR UTILIZING SHORT-TERM AND LONG-TERM COMPUTER MEMORY STRUCTURES

Systems and methods enabling rendering an avatar attuned to a user. The systems and methods include receiving audio-visual data of user communications of a user. Using the audio-visual data, the systems and methods may determine vocal characteristics of the user, facial action units representative of facial features of the user, and speech of the user based on a speech recognition model and/or natural language understanding model. Based on the vocal characteristics, an acoustic emotion metric can be determined. Based on the speech recognition data, a speech emotion metric may be determined. Based on the facial action units, a facial emotion metric may be determined. An emotional complex signature may be determined to represent an emotional state of the user for rendering the avatar attuned to the emotional state based on a combination of the acoustic emotion metric, the speech emotion metric and the facial emotion metric.

IMAGE CAPTURE APPARATUS, CONTROL METHOD THEREFOR, IMAGE PROCESSING APPARATUS, AND IMAGE PROCESSING SYSTEM
20230217084 · 2023-07-06 ·

An image capture apparatus obtains image data captured using an image sensor and then generates, as metadata, information relating to a state of the image capture apparatus at the time of capturing the image data as well as to the image data. The image capture apparatus comprises a first output circuit for outputting the image data in a first format to an external apparatus and a second output circuit for outputting a part of the metadata in a second format to the external apparatus. The second output circuit is from the first output circuit.

IMAGE CAPTURE APPARATUS, CONTROL METHOD THEREFOR, IMAGE PROCESSING APPARATUS, AND IMAGE PROCESSING SYSTEM
20230217084 · 2023-07-06 ·

An image capture apparatus obtains image data captured using an image sensor and then generates, as metadata, information relating to a state of the image capture apparatus at the time of capturing the image data as well as to the image data. The image capture apparatus comprises a first output circuit for outputting the image data in a first format to an external apparatus and a second output circuit for outputting a part of the metadata in a second format to the external apparatus. The second output circuit is from the first output circuit.

Systems and methods for animation generation

Systems and methods for animating from audio in accordance with embodiments of the invention are illustrated. One embodiment includes a method for generating animation from audio. The method includes steps for receiving input audio data, generating an embedding for the input audio data, and generating several predictions for several tasks from the generated embedding. The several predictions includes at least one of blendshape weights, event detection, and/or voice activity detection. The method includes steps for generating a final prediction from the several predictions, where the final prediction includes a set of blendshape weights, and generating an output based on the generated final prediction.

AVATAR RENDERING OF PRESENTATIONS

A computer-implemented method for avatar rendering of virtual presentations is disclosed. The computer-implemented method includes extracting visual content from a presentation. The computer-implemented method further includes extracting audio content from the presentation. The computer-implemented method includes correlating the visual content with the audio content of the presentation. The computer-implemented method includes generating a virtual avatar to dynamically render a virtual presentation to a viewer, based at least in part, on the correlated visual content and audio content of the presentation.

FACIAL ACTIVITY DETECTION FOR VIRTUAL REALITY SYSTEMS AND METHODS

In an embodiment, a virtual reality ride system includes a display to present virtual reality image content to a first rider, an audio sensor to capture audio data associated with a second rider, and an image sensor to capture image data associated with the second rider. The virtual reality ride system also includes at least one processor communicatively coupled to the display and configured to (i) receive the audio data, the image data, or both, (ii) generate a virtual avatar corresponding to the second rider, wherein the virtual avatar includes a set of facial features, (iii) update the set of facial features based on the audio data, the image data, or both, and (iv) instruct the display to present the virtual reality image content including the virtual avatar and the updated set of facial features.

AUTOMATED PANORAMIC IMAGE CONNECTIONS FROM OUTDOOR TO INDOOR ENVIRONMENTS

Automated panoramic image connections from outdoor to indoor environments is provided. A system identifies, in a data repository, a virtual tour of an internal portion of a physical building formed from a plurality of images connected with a linear path along a persistent position of a virtual camera. The system receives, from a third-party data repository, image data corresponding to an external portion of the physical building. The system detects, within the image data, an entry point for the internal portion of the physical building. The system generates, responsive to the detection, a step-in transition at the entry point in the image data. The system connects the virtual tour with the step-in transition generated for the image data at the entry point. The system initiates, on a client device responsive to an interaction with the entry point, the step-in transition to cause a stream of the virtual tour.

AUTOMATED PANORAMIC IMAGE CONNECTIONS FROM OUTDOOR TO INDOOR ENVIRONMENTS

Automated panoramic image connections from outdoor to indoor environments is provided. A system identifies, in a data repository, a virtual tour of an internal portion of a physical building formed from a plurality of images connected with a linear path along a persistent position of a virtual camera. The system receives, from a third-party data repository, image data corresponding to an external portion of the physical building. The system detects, within the image data, an entry point for the internal portion of the physical building. The system generates, responsive to the detection, a step-in transition at the entry point in the image data. The system connects the virtual tour with the step-in transition generated for the image data at the entry point. The system initiates, on a client device responsive to an interaction with the entry point, the step-in transition to cause a stream of the virtual tour.