G10L19/16

Optimized audio forwarding
11696075 · 2023-07-04 · ·

Methods and systems for optimizing a routing of audio data to audio transmitting devices using a Bluetooth network are disclosed. One method includes receiving an encoded audio bitstream at a first speaker of the audio rendering system comprising a first and a second audio channels, separating a first set of spectral components of the first audio channel and a second set of spectral components of the second audio channel from the encoded audio bitstream, without decoding the audio bitstream, generating a first encoded bitstream from the first set of spectral components, and forwarding the first encoded bitstream to a second speaker of the audio rendering system over the wireless link.

ROBUST RETRANSMISSION TOPOLOGIES USING ERROR CORRECTION
20230005492 · 2023-01-05 · ·

Methods and systems for improving the robustness of wireless communications. The methods and systems provided transmit data packets over a first isochronous stream and transmit one or more supplemental data packets over the same time intervals. The one or more supplemental data packets are used to re-create and/or enhance at least a portion of one or more data packets of the plurality of data packets that have already been sent. Alternatively, the one or more supplemental data packets are used to create and/or enhance at least a portion of one or more data packets of the plurality of data packets that will be received during the next isochronous intervals. The methods and system described herein allow for increased robustness by allowing for better retransmission with correctly received packets and the methods set forth herein work with any Bluetooth broadcaster sink without modification.

LAYERED DESCRIPTION OF SPACE OF INTEREST
20230007425 · 2023-01-05 · ·

Aspects of the disclosure provide methods and apparatuses for audio processing. In some examples, an apparatus for media processing includes processing circuitry. The processing circuitry receive audio inputs associated with a layered description for a space of interest in an audio scene. The space of interest includes a plurality of subspaces. The layered description includes a first layer and a second layer. The first layer has a common node with a first value that is a common attribute value of two or more subspaces in the plurality of subspaces. The second layer has individual nodes respectively associated with each of the plurality of subspaces. The processing circuitry determines the plurality of subspaces of the space of interest based on the layered description, and renders an audio output based on the audio inputs in response to a location of a subject of the audio scene being in the space of interest.

AUTOCORRECTION OF PRONUNCIATIONS OF KEYWORDS IN AUDIO/VIDEOCONFERENCES
20230005487 · 2023-01-05 ·

The present disclosure relates to automatically correcting mispronounced keywords during a conference session. More particularly, the present invention provides methods and systems for automatically correcting audio data generated from audio input having indications of mispronounced keywords during an audio/videoconferencing system. In some embodiments, the process of automatically correcting the audio data may require a re-encoding process of the audio data at the conference server. In alternative embodiments, the process may require updating the audio data at the receiver end of the conferencing system.

INFORMATION EXCHANGE ON MOBILE DEVICES USING AUDIO
20230005491 · 2023-01-05 ·

In some implementations, a user device may receive input that triggers transmission of information via sound. The user device may select an audio clip based on a setting associated with the device, and may modify a digital representation of the selected audio clip using an encoding algorithm and based on data associated with a user of the device. The user device may transmit, to a remote server, an indication of the selected audio clip, an indication of the encoding algorithm, and the data associated with the user. The user device may use a speaker to play audio, based on the modified digital representation, for recording by other devices. Accordingly, the user device may receive, from the remote server and based on the speaker playing the audio, a confirmation that users associated with the other devices have performed an action based on the data associated with the user of the device.

Using metadata to aggregate signal processing operations

A technique including receiving and decoding a coded bitstream encoded with audio content including first audio objects corresponding to a first media content type of two consecutive media content types and second audio objects corresponding to a second media content type of the two consecutive media content types, and audio metadata corresponding to the audio content. The audio metadata including first and second audio object gains, for the first and second audio objects, generated in part based on a first fading curve of the first media content type and a second fading curve of the second media content type, respectively. The technique further includes applying the first and second audio object gains to the first and second audio objects, and rendering a sound field represented by the first audio object with the applied first audio object gain and the second audio object with the applied second audio object gain.

Switching Binaural Sound
20220417687 · 2022-12-29 ·

A method provides binaural sound to a person through electronic earphones. The binaural sound localizes to a sound localization point (SLP) in empty space that is away from but proximate to the person. When an event occurs, the binaural sound switches or changes to stereo sound, to mono sound, or to altered binaural sound.

Correlating scene-based audio data for psychoacoustic audio coding

In general, techniques are described by which to correlate scene-based audio data for psychoacoustic audio coding. A device comprising a memory and one or more processors may be configured to perform the techniques. The memory may store a bitstream including a plurality of encoded correlated components of a soundfield represented by scene-based audio data. The one or more processors may perform psychoacoustic audio decoding with respect to one or more of the plurality of encoded correlated components to obtain a plurality of correlated components, and obtain, from the bitstream, an indication representative of how the one or more of the plurality of correlated components were reordered in the bitstream. The one or more processors may reorder, based on the indication, the plurality of correlated components to obtain a plurality of reordered components, and reconstruct, based on the plurality of reordered components, the scene-based audio data.

BITRATE DISTRIBUTION IN IMMERSIVE VOICE AND AUDIO SERVICES

Embodiments are disclosed for bitrate distribution in immersive voice and audio services. In an embodiment, a method of encoding an IVAS bitstream comprises: receiving an input audio signal; downmixing the input audio signal into one or more downmix channels and spatial metadata; reading a set of one or more bitrates for the downmix channels and a set of quantization levels for the spatial metadata from a bitrate distribution control table; determining a combination of the one or more bitrates for the downmix channels; determining a metadata quantization level from the set of metadata quantization levels using a bitrate distribution process; quantizing and coding the spatial metadata using the metadata quantization level; generating, using the combination of one or more bitrates, a downmix bitstream for the one or more downmix channels; combining the downmix bitstream, the quantized and coded spatial metadata and the set of quantization levels into the IVAS bitstream.

INFORMATION PROCESSING APPARATUS, REPRODUCTION PROCESSING APPARATUS, AND INFORMATION PROCESSING METHOD

There is provided an information processing apparatus, a reproduction processing apparatus, and an information processing method that improve data transmission efficiency. A preprocessing unit (102) generates, as scene configuration information indicating a configuration of a scene of 6DoF content including a three-dimensional object in a three-dimensional space, dynamic scene configuration information that changes over time and static scene configuration information that does not change over time, the static scene configuration information being scene configuration information different from the dynamic scene configuration information.