H04S2420/03

METHODS, APPARATUS AND SYSTEMS FOR ENCODING AND DECODING OF MULTI-CHANNEL AMBISONICS AUDIO DATA

Conventional audio compression technologies perform a standardized signal transformation, independent of the type of the content. Multi-channel signals are decomposed into their signal components, subsequently quantized and encoded. This is disadvantageous due to lack of knowledge on the characteristics of scene composition, especially for e.g. multi-channel audio or Higher-Order Ambisonics (HOA) content. A method for decoding an encoded bitstream of multi-channel audio data and associated metadata is provided, including transforming the first Ambisonics format of the multi-channel audio data to a second Ambisonics format representation of the multi-channel audio data, wherein the transforming maps the first Ambisonics format of the multi-channel audio data into the second Ambisonics format representation of the multi-channel audio data. A method for encoding multi-channel audio data that includes audio data in an Ambisonics format, wherein the encoding includes transforming the audio data in an Ambisonics format into encoded multi-channel audio data is also provided.

Renderer controlled spatial upmix

An audio decoder device for decoding a compressed input audio signal having at least one core decoder having one or more processors for generating a processor output signal based on a processor input signal, wherein a number of output channels of the processor output signal is higher than a number of input channels of the processor input signal, wherein each of the one or more processors has a decorrelator and a mixer, wherein a core decoder output signal having a plurality of channels has the processor output signal, and wherein the core decoder output signal is suitable for a reference loudspeaker setup; at least one format converter device configured to convert the core decoder output signal into an output audio signal, which is suitable for a target loudspeaker setup; and a control device configured to control at least one or more processors in such way that the decorrelator of the processor may be controlled independently from the mixer of the processor, wherein the control device is configured to control at least one of the decorrelators of the one or more processors depending on the target loudspeaker setup.

INFORMATION PROCESSING DEVICE AND METHOD, AND PROGRAM

The present technology relates to an information processing device and method and a program that make it possible to reduce the total number of objects while the influence on the sound quality is suppressed.

The information processing device includes a pass-through object selection unit configured to acquire data of L objects and select, from the L objects, M pass-through objects whose data is to be outputted as it is, and an object generation unit configured to generate, on the basis of the data of multiple non-pass-through objects that are not the pass-through objects among the L objects, the data of N new objects, N being smaller than (L−M). The present technology can be applied to an information processing device.

SIGNAL PROCESSING DEVICE AND METHOD, LEARNING DEVICE AND METHOD, AND PROGRAM

The present technology relates to signal processing device and method, learning device and method, and a program that enable even an inexpensive device to perform audio replaying with high quality.

A signal processing device includes: a decoding processing unit that demultiplexes an input bit stream into a first audio signal, meta data of the first audio signal, and first high-frequency band information for expanding a band; a band expanding unit that performs band expansion processing on the basis of a second audio signal and second high-frequency band information and thereby generates an output audio signal, the second audio signal being obtained by performing signal processing on the basis of the first audio signal and the meta data, the second high-frequency band information being generated on the basis of the first high-frequency band information. The present technology can be applied to a smartphone.

NOISE FILLING IN MULTICHANNEL AUDIO CODING
20210358508 · 2021-11-18 ·

In multichannel audio coding, an improved coding efficiency is achieved by the following measure: the noise filling of zero-quantized scale factor bands is performed using noise filling sources other than artificially generated noise or spectral replica. In particular, the coding efficiency in multichannel audio coding may be rendered more efficient by performing the noise filling based on noise generated using spectral lines from a previous frame of, or a different channel of the current frame of, the multichannel audio signal.

SPATIAL AUDIO PROCESSING

According to an example embodiment, a method for processing a multi-channel input audio signal representing a sound field into a multi-channel output audio signal representing said sound field in accordance with a predefined loudspeaker layout is provided, the method comprising the following for at least one frequency band: obtaining spatial audio parameters that are descriptive of spatial characteristics of said sound field; estimating a signal energy of the sound field represented by the multi-channel input audio signal; estimating, based on said signal energy and the obtained spatial audio parameters, respective output signal energies for channels of the multi-channel output audio signal according to said predefined loudspeaker layout; determining a maximum output energy as the largest of the output signal energies across channels of said multi-channel output audio signal; and deriving, on basis of said maximum output energy, a gain value for adjusting sound reproduction gain in at least one of said channels of the multi-channel output audio signal.

System for rendering and playback of object based audio in various listening environments

Embodiments are described for a system of rendering object-based audio content through a system that includes individually addressable drivers, including at least one driver that is configured to project sound waves toward one or more surfaces within a listening environment for reflection to a listening area within the listening environment; a renderer configured to receive and process audio streams and one or more metadata sets associated with each of the audio streams and specifying a playback location of a respective audio stream; and a playback system coupled to the renderer and configured to render the audio streams to a plurality of audio feeds corresponding to the array of audio drivers in accordance with the one or more metadata sets.

METHOD AND APPARATUS FOR PERFORMING BINAURAL RENDERING OF AUDIO SIGNAL

A method and apparatus for performing binaural rendering of an audio signal are provided. The method includes identifying an input signal that is based on an object, and metadata that includes distance information indicating a distance to the object, generating a binaural filter that is based on the metadata, using a binaural room impulse response, obtaining a binaural filter to which a low-pass filter (LPF) is applied, using a frequency response control that is based on the distance information, and generating a binaural-rendered output signal by performing a convolution of the input signal and the binaural filter to which the LPF is applied.

Audio processing device and method therefor
11223921 · 2022-01-11 · ·

An input unit receives input of an assumed listening position of sound of an object, which is a sound source, and outputs assumed listening position information indicating the assumed listening position. A position information correction unit corrects position information of each object on the basis of the assumed listening position information to obtain corrected position information. A gain/frequency characteristic correction unit performs gain correction and frequency characteristic correction on a waveform signal of an object on the basis of the position information and the corrected position information. A spatial acoustic characteristic addition unit further adds a spatial acoustic characteristic to the waveform signal resulting from the gain correction and the frequency characteristic correction on the basis of the position information of the object and the assumed listening position information. The present technology is applicable to an audio processing device.

REPRESENTING SPATIAL AUDIO BY MEANS OF AN AUDIO SIGNAL AND ASSOCIATED METADATA

There is provided encoding and decoding methods for representing spatial audio that is a combination of directional sound and diffuse sound. An exemplary encoding method includes inter alia creating a single- or multi-channel downmix audio signal by downmixing input audio signals from a plurality of microphones in an audio capture unit capturing the spatial audio; determining first metadata parameters associated with the downmix audio signal, wherein the first metadata parameters are indicative of one or more of: a relative time delay value, a gain value, and a phase value associated with each input audio signal; and combining the created downmix audio signal and the first metadata parameters into a representation of the spatial audio.