Patent classifications
G10L21/0316
SPATIAL OPTIMIZATION FOR AUDIO PACKET TRANSFER IN A METAVERSE
A computer-implemented method includes receiving audio packets associated with a first client device, where the audio packets each include an audio capture waveform, a timestamp, and a digital entity identification (ID). The method further includes determining, based on the digital entity ID, a position of a first digital entity in a metaverse. The method further includes determining a subset of other digital entities in a metaverse that are within an audio area of the first digital entity based on (a) a falloff distance between the first digital entity and each of the other digital entities and (b) a direction of audio propagation between the first digital entity and each of the other digital entities. The method further includes transmitting the audio packets to second client devices associated with the subset of other digital entities in the metaverse.
SPATIAL OPTIMIZATION FOR AUDIO PACKET TRANSFER IN A METAVERSE
A computer-implemented method includes receiving audio packets associated with a first client device, where the audio packets each include an audio capture waveform, a timestamp, and a digital entity identification (ID). The method further includes determining, based on the digital entity ID, a position of a first digital entity in a metaverse. The method further includes determining a subset of other digital entities in a metaverse that are within an audio area of the first digital entity based on (a) a falloff distance between the first digital entity and each of the other digital entities and (b) a direction of audio propagation between the first digital entity and each of the other digital entities. The method further includes transmitting the audio packets to second client devices associated with the subset of other digital entities in the metaverse.
Multi-band noise gate
The present disclosure relates to processing a plurality of audio signals. A device receives the plurality of audio signals in the frequency domain and determining an overall attenuation multiplier based on the plurality of audio signals and an overall lookup table that relates decibel values to different overall attenuation multipliers. The device determines an attenuation vector comprising a plurality of bin-specific attenuation multipliers, each bin-specific attenuation multiplier respectively corresponding to a different frequency bin of the plurality of frequency bins. The device scales each bin-specific attenuation value in the attenuation vector with the overall attenuation multiplier, and edits each of the audio signals based on the scaled bin-specific attenuation values in the attenuation vector.
Multi-band noise gate
The present disclosure relates to processing a plurality of audio signals. A device receives the plurality of audio signals in the frequency domain and determining an overall attenuation multiplier based on the plurality of audio signals and an overall lookup table that relates decibel values to different overall attenuation multipliers. The device determines an attenuation vector comprising a plurality of bin-specific attenuation multipliers, each bin-specific attenuation multiplier respectively corresponding to a different frequency bin of the plurality of frequency bins. The device scales each bin-specific attenuation value in the attenuation vector with the overall attenuation multiplier, and edits each of the audio signals based on the scaled bin-specific attenuation values in the attenuation vector.
Low latency automixer integrated with voice and noise activity detection
Systems and methods are disclosed for providing voice and noise activity detection with audio automixers that can reject errant non-voice or non-human noises while maximizing signal-to-noise ratio and minimizing audio latency.
Low latency automixer integrated with voice and noise activity detection
Systems and methods are disclosed for providing voice and noise activity detection with audio automixers that can reject errant non-voice or non-human noises while maximizing signal-to-noise ratio and minimizing audio latency.
METHOD AND UNIT FOR PERFORMING DYNAMIC RANGE CONTROL
The present document describes a dynamic range control unit (210) configured to apply dynamic range control, referred to as DRC, to an audio signal (211). The DRC unit (210) is configured to downsample a subband signal (212) derived from the audio signal (211), to provide a downsampled subband signal (321), to determine a DRC gain (329) based on the downsampled subband signal (321), and to apply the DRC gain (329) to the subband signal (212), to provide a compressed subband signal (213) of a compressed audio signal (214).
Adaptive processing with multiple media processing nodes
Techniques for adaptive processing of media data based on separate data specifying a state of the media data are provided. A device in a media processing chain may determine whether a type of media processing has already been performed on an input version of media data. If so, the device may adapt its processing of the media data to disable performing the type of media processing. If not, the device performs the type of media processing. The device may create a state of the media data specifying the type of media processing. The device may communicate the state of the media data and an output version of the media data to a recipient device in the media processing chain, for the purpose of supporting the recipient device's adaptive processing of the media data.
ADDING BACKGROUND SOUND TO SPEECH-CONTAINING AUDIO DATA
An editing method facilitates the task of adding background sound to speech-containing audio data so as to augment the listening experience. The editing method is executed by a processor in a computing device and comprises obtaining characterization data that characterizes time segments in the audio data by at least one of topic and sentiment; deriving, for a respective time segment in the audio data and based on the characterization data, a desired property of a background sound to be added to the audio data in the respective time segment, and providing the desired property for the respective time segment so as to enable the audio data to be combined, within the respective time segment, with background sound having the desired property. The background sound may be selected and added automatically or by manual user intervention.
MACHINE LEARNING-BASED KEY GENERATION FOR KEY-GUIDED AUDIO SIGNAL TRANSFORMATION
A method comprise: receiving input audio and target audio having a target audio characteristic; using a first neural network, trained to generate key parameters that represent the target audio characteristic based on one or more of the target audio and the input audio, generating the key parameters; and configuring a second neural network, trained to be configured by the key parameters, with the key parameters to cause the second neural network to perform a signal transformation of the input audio, to produce output audio having an output audio characteristic corresponding to and that matches the target audio characteristic.