H04S2420/11

Recording and rendering spatial audio signals

Examples of the disclosure relate to a method, apparatus and computer program, the method including: obtaining audio signals wherein the audio signals represent spatial sound and can be used to render spatial audio using linear methods; obtaining spatial metadata corresponding to the spatial sound represented by the audio signals; and associating the spatial metadata with the obtained audio signals so that in a first rendering context the obtained audio signals can be rendered without using the spatial metadata and in a second rendering context the obtained audio signals can be rendered with using the spatial metadata.

METHODS AND APPARATUS FOR DECODING A COMPRESSED HOA SIGNAL

Methods and apparatus for decoding a compressed Higher Order Ambisonics (HOA) representation of a sound or soundfield. The method may include receiving a bit stream containing the compressed HOA representation and decoding, based on a determination that there are multiple layers, the compressed HOA representation from the bitstream to obtain a sequence of decoded HOA representations. A first subset of the sequence of decoded HOA representations is determined based only on corresponding ambient HOA components. A second subset of the sequence of decoded HOA representations is determined based on corresponding ambient HOA components and corresponding predominant sound components. For a frame k, the sequence of decoded HOA representations are represented at least in part by

[00001]c^nk1=c^AMB,nk1c^nk1=c^PS,nk1+c^AMB,nk1,for n in the first subsetfor n in the second subset

where

[00002]c^AMB,nk1

corresponds to the corresponding ambient HOA components and

[00003]Concept for generating an enhanced sound-field description or a modified sound field description using a depth-extended DirAC technique or other techniques

An apparatus for generating an enhanced sound field description includes: a sound field generator for generating at least one sound field description indicating a sound field with respect to at least one reference location; and a meta data generator for generating meta data relating to spatial information of the sound field, wherein the at least one sound field description and the meta data constitute the enhanced sound field description. The meta data can be a depth map associating a distance information to a direction in a full band or a subband, i.e., a time frequency bin.

Quantization of spatial audio parameters
11475904 · 2022-10-18 · ·

There is disclosed inter alia an apparatus for spatial audio signal encoding which determines at least one spatial audio parameter comprising a direction parameter with an elevation component and an azimuth component. The elevation component and azimuth component of the direction parameter are then converted to an index value.

Apparatuses and associated methods for spatial presentation of audio

An apparatus, the apparatus comprising means configured to: receive audio content comprising voice audio and ambient audio and directional information indicative of a direction of the at least one sound source and the direction of the remote user relative to the reference point; receive a reference location; provide for presentation of the ambient audio with a first spatial audio effect, based on the directional information, and presentation of the voice audio with a second spatial audio effect, based on the directional information, receive repositioning signalling from the remote user device; and provide for presentation of the audio content using a modification of the first spatial audio effect to reposition an ambient-perceived direction based on the repositioning signalling and/or a modification of the second spatial audio effect to reposition a voice-perceived direction based on the repositioning signalling to increase the spatial separation between the voice-perceived direction and the ambient-perceived direction.

Spatial audio capture, transmission and reproduction
11638112 · 2023-04-25 · ·

An apparatus including circuitry configured for: obtaining at least one spatial audio signal including at least one audio signal, wherein the at least one spatial audio signal defines an audio scene forming at least in part an immersive media content; obtaining at least one augmentation control parameter associated with the spatial audio signal, wherein the at least one augmentation control parameter is configured to control at least in part a rendering of the audio scene; and transmitting/storing the at least one spatial audio signals and the at least one augmentation control parameter, the at least one spatial audio signal and the at least one augmentation control parameter being received/retrieved at a renderer so as to control at least in part rendering of the audio scene based on the at least one augmentation control parameter.

Method, system and computer program product for recording and interpolation of ambisonic sound fields

A method of recording ambisonic sound fields with a spatially distributed plurality of ambisonic microphones comprising a step of recording sound signals from plurality of ambisonic microphones a step of converting recorded sound signals to ambisonic sound fields and a step of interpolation of the ambisonic sound fields according to the invention comprises a step of generating synchronizing signals for particular ambisonic microphones for synchronized recording of sound signals from plurality of ambisonic microphones and during the step of interpolation of the ambisonic sound fields it includes filtering sound signals from particular microphones with individual filters having a distance-dependent impulse response having a cut-off frequency f.sub.c(d.sub.m) depending on distance d.sub.m between point of interpolation and m-th microphone applying gradual distance dependent attenuation applying re-balancing with amplification of 0.sup.th ordered ambisonic component and attenuating remaining ambisonic components. Invention further concerns recording system and computer program product.

Transform ambisonic coefficients using an adaptive network

A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device also includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are also configured to apply one adaptive network, based on a constraint, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.

Smart hybrid rendering for augmented reality/virtual reality audio

An example device for processing one or more audio streams includes a memory configured to store the one or more audio streams and one or more processors implemented in circuitry coupled to the memory. The one or more processors are configured to determine a listener position. The one or more processors are also configured to determine one or more clusters of the one or more audio streams. The one or more processors are also configured to determine a rendering mode based on the listener position and the one or more clusters. The device also includes a renderer configured to render at least one of the one or more clusters of audio streams based on the rendering mode.

Spatial audio data exchange

A device includes one or more processors configured to execute instructions to obtain, at a first audio output device, first spatial audio data and a first reference time associated, and to cause the first reference time and data representing at least a portion of the first spatial audio data to be transmitted from the first audio output device. The instructions further cause the one or more processors to receive, at the first audio output device from a second audio output device, second spatial audio data and a second reference time. The instructions further cause the one or more processors to, based on the first reference time and the second reference time, time-align the first spatial audio data and the second spatial audio data to generate combined audio data representing a three-dimensional (3D) sound field and to generate audio output based on the combined audio data.