G10L21/18

Audio Generation Methods and Systems

A method of generating audio assets, comprising the steps of: receiving a plurality of input audio assets, converting each input audio asset into an input graphical representation, generating an input multi-channel image by stacking each input graphical representation in a separate channel of the image, feeding the input multi-channel image into a generative model to train the generative model and generate one or more output multi-channel images, each output multi-channel image comprising an output graphical representation, extracting the output graphical representations from each output multi-channel image and converting each output graphical representation into an output audio asset.

Audio Generation Methods and Systems

A method of generating audio assets, comprising the steps of: receiving a plurality of input audio assets, converting each input audio asset into an input graphical representation, generating an input multi-channel image by stacking each input graphical representation in a separate channel of the image, feeding the input multi-channel image into a generative model to train the generative model and generate one or more output multi-channel images, each output multi-channel image comprising an output graphical representation, extracting the output graphical representations from each output multi-channel image and converting each output graphical representation into an output audio asset.

Audio Generation Methods and System

A method of generating audio assets, comprising the steps of: receiving an input multi-layered audio asset comprising a plurality of audio layers, generating an input multi-channel image, wherein each channel of the input multi-channel image comprises an input image representative of one of the audio layers, training a generative model on the input multi-channel image and implementing the trained generative model to generate an output multi-channel image, wherein each channel of the output multi-channel image comprises an output image representative of an output audio layer, and generating an output multi-layered audio asset based on a combination of output audio layers derived from the output images.

Audio Generation Methods and System

A method of generating audio assets, comprising the steps of: receiving an input multi-layered audio asset comprising a plurality of audio layers, generating an input multi-channel image, wherein each channel of the input multi-channel image comprises an input image representative of one of the audio layers, training a generative model on the input multi-channel image and implementing the trained generative model to generate an output multi-channel image, wherein each channel of the output multi-channel image comprises an output image representative of an output audio layer, and generating an output multi-layered audio asset based on a combination of output audio layers derived from the output images.

Viseme data generation for presentation while content is output

Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.

Viseme data generation for presentation while content is output

Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.

Presentation of communications
11482240 · 2022-10-25 · ·

A method to present communications is provided. The method may include obtaining, at a device, a request from a user to play back a stored message that includes audio. In response to obtaining the request, the method may include directing the audio of the message to a transcription system from the device. In these and other embodiments, the transcription system may be configured to generate text that is a transcription of the audio in real-time. The method may further include obtaining, at the device, the text from the transcription system and presenting, by the device, the text generated by the transcription system in real-time. In response to obtaining the text from the transcription system, the method may also include presenting, by the device, the audio such that the text as presented is substantially aligned with the audio.

Presentation of communications
11482240 · 2022-10-25 · ·

A method to present communications is provided. The method may include obtaining, at a device, a request from a user to play back a stored message that includes audio. In response to obtaining the request, the method may include directing the audio of the message to a transcription system from the device. In these and other embodiments, the transcription system may be configured to generate text that is a transcription of the audio in real-time. The method may further include obtaining, at the device, the text from the transcription system and presenting, by the device, the text generated by the transcription system in real-time. In response to obtaining the text from the transcription system, the method may also include presenting, by the device, the audio such that the text as presented is substantially aligned with the audio.

FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES
20230071778 · 2023-03-09 ·

An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.

FOVEATED BEAMFORMING FOR AUGMENTED REALITY DEVICES AND WEARABLES
20230071778 · 2023-03-09 ·

An augmented reality (AR) device, such as AR glasses, may include a microphone array. The sensitivity of the microphone array can be directed to a target by beamforming, which includes combining the audio of each microphone of the array in a particular way based on a location of the target. The present disclosure describes systems and methods to determine the location of the target based on a gaze of a user and beamform the audio accordingly. This eye-tracked beamforming (i.e., foveated beamforming) can be used by AR applications to enhance sounds from a gaze direction and to suppress sounds from other directions. Additionally, the gaze information can be used to help visualize the results of an AR application, such as speech-to-text.