H04S3/00

Methods for parametric multi-channel encoding

The present document relates to audio coding systems. In particular, the present document relates to efficient methods and systems for parametric multi-channel audio coding. An audio encoding system configured to generate a bitstream indicative of a downmix signal and spatial metadata for generating a multi-channel upmix signal from the downmix signal is described. The system comprises a downmix processing unit configured to generate the downmix signal from a multi-channel input signal; wherein the downmix signal comprises m channels and wherein the multi-channel input signal comprises n channels; n, m being integers with m<n. Furthermore, the system comprises a parameter processing unit configured to determine the spatial metadata from the multi-channel input signal. In addition, the system comprises a configuration unit configured to determine one or more control settings for the parameter processing unit based on one or more external settings; wherein the one or more external settings comprise a target data-rate for the bitstream and wherein the one or more control settings comprise a maximum data-rate for the spatial metadata.

Systems and methods for sound source virtualization

A system and method for externalizing sound. The system includes a headphone assembly and a localizer configured to collect information related to a location of the user and of an acoustically reflective surface in the environment. A controller is configured to determine a location of at least one virtual sound source, and generate head related transfer functions that simulate characteristics of sound from the virtual sound source directly to the user and to the user via a reflection by the reflective surface. A signal processing assembly is configured to create one or more output signals by filtering the sound signal respectively with the HRTFs. Each speaker of the headphone assembly is configured to produce sound in accordance with the output signal.

Streaming binaural audio from a cloud spatial audio processing system to a mobile station for playback on a personal audio delivery device

Spatial audio is received from an audio server over a first communication link. The spatial audio is converted by a cloud spatial audio processing system into binaural audio. The binauralized audio is streamed from the cloud spatial audio processing system to a mobile station over a second communication link to cause the mobile station to play the binaural audio on the personal audio delivery device.

Sound signal processing method and sound signal processing device
11615776 · 2023-03-28 · ·

A sound signal processing method includes: receiving a line-inputted sound signal; controlling a volume of the line-inputted sound signal; and generating an early reflected sound control signal using the line-inputted sound signal having the controlled volume.

PERCEPTUAL BASS EXTENSION WITH LOUDNESS MANAGEMENT AND ARTIFICIAL INTELLIGENCE (AI)
20230029841 · 2023-02-02 ·

One embodiment provides a computer-implemented method that includes implementing a customizable compressor for at least one sidechain processing associated with a loudspeaker. Machine learning is applied to automatically tune one or more parameters of the at least one sidechain processing. One or more channels are extracted, including a low-frequency effects (LFE) channel, for nonlinear signal synthesis. A proportional power-sum-based mix-in of an LFE sidechain channel is applied into a non-LFE sidechain. The LFE sidechain channel is maintained within a specified threshold of being level, before and after nonlinear signal synthesis.

Presentation of Premixed Content in 6 Degree of Freedom Scenes

A method including: obtaining at least two audio signals for reproduction, each of the at least two audio signals associated with a respective one of at least two reproduction locations within an audio reproduction space; obtaining within the audio reproduction space at least two zones; obtaining at least one location for a user's position within the audio reproduction space, the at least one location being relative to at least one of the at least two zones and the at least two reproduction locations; and processing the at least two audio signals based on the obtained at least one location for the user's position within the audio reproduction space to generate at least one output audio signal, the at least one output audio signal is reproduced from at least one of the at least two reproduction locations.

INTEGRATION OF HIGH FREQUENCY AUDIO RECONSTRUCTION TECHNIQUES

A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag. The high frequency regeneration is performed as a post-processing operation with a delay of 3010 samples per audio channel.

METHOD AND DEVICE FOR PROCESSING AUDIO SIGNAL, USING METADATA
20230091281 · 2023-03-23 ·

Disclosed is a device for processing an audio signal, which renders an audio signal. The device for processing an audio signal includes a processor. The processor receives metadata including an audio signal and first element reference distance information and renders a first element signal on the basis of the first element reference distance information, wherein the first element reference distance information indicates the reference distance of an element signal. The audio signal is capable of including a second element signal which may be simultaneously rendered with the first element signal, and the metadata is capable of including second element distance information indicating the distance of the second element signal. The number of bits required for representing the first element reference distance information is smaller than the number of bits required for representing the second element distance information.

Audio Representation and Associated Rendering
20230085918 · 2023-03-23 ·

An apparatus for immersive audio communication including circuitry configured to: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream includes a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.

Audio Representation and Associated Rendering
20230085918 · 2023-03-23 ·

An apparatus for immersive audio communication including circuitry configured to: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream includes a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.