Patent classifications
H04S2420/11
RENDERING AUDIO
An apparatus, method and computer program is described comprising: providing an incoming audio indication in response to incoming audio (41), the incoming audio indication comprising visual representations of a plurality of audio modes (55-58); receiving at least one input from a user (59) for selecting one of the plurality of audio modes (42); and rendering audio (43) based, at least partially, on the selected audio mode, wherein one or more parameters of the rendered audio are determined based on the selected audio mode.
Spatial Audio Capture, Transmission and Reproduction
An apparatus configured to: obtain at least one spatial audio signal that defines an audio scene forming at least in part an immersive media content; obtain metadata associated with the at least one spatial audio signal; obtain at least one augmentation control parameter associated with the at least one spatial audio signal; obtain at least one augmentation audio signal; render an output audio signal that is based, at least partially, on the at least one spatial audio signal, the metadata associated with the at least one spatial audio signal, the at least one augmentation control parameter, and the at least one augmentation audio signal; and obtain an indication that at least part of the at least one spatial audio signal has been omitted from the output audio signal based, at least partially, on at least part of the at least one augmentation audio signal included in the output audio signal.
Spherical harmonic decomposition of a sound field detected by an equatorial acoustic sensor array
An audio system includes an equatorial acoustic sensor array (EASA) that may be coupled to an object. The audio system is configured to detect, via the EASA, signals corresponding to a portion of a sound field in a local area. The detected signals are converted into a plurality of corresponding abstract representations that describe the portion of the sound field. Effects of scattering of the object are removed from the abstract representations to create adjusted abstract representations. A set of spherical harmonic (SH) coefficients is determined using the adjusted abstract representations. The set of SH coefficients describe an entirety of the sound field. And the set of SH coefficients and head related transfer functions of a user are used for binaural rendering of the reconstructed sound field to the user.
Sound field adjustment
A device includes one or more processors configured to receive, via wireless transmission from a streaming device, encoded ambisonics audio data representing a sound field. The one or more processors are also configured to perform decoding of the ambisonics audio data to generate decoded ambisonics audio data. The decoding of the ambisonics audio data includes base layer decoding of a base layer of the encoded ambisonics audio data and selectively includes enhancement layer decoding in response to an amount of movement of the device. The one or more processors are further configured to adjust the decoded ambisonics audio data to alter the sound field based on data associated with at least one of a translation or an orientation associated with the movement of the device. The one or more processors are also configured to output the adjusted decoded ambisonics audio data to two or more loudspeakers for playback.
METHOD AND APPARATUS FOR AMBISONIC SIGNAL REPRODUCTION IN VIRTUAL REALITY SPACE
Provided is a method of reproducing an ambisonic signal in a virtual reality (VR) space. The ambisonic signal reproduction method may include receiving an ambisonic signal, mapping the ambisonic signal to channels localized on a sphere according to an equivalent spatial domain (ESD) standard corresponding to an order of the ambisonic signal, and performing a sound field reproduction in the VR space based on the channels localized on the sphere.
Methods and devices for encoding and/or decoding immersive audio signals
The present document describes a method (700) for encoding a multi-channel input signal (201). The method (700) comprises determining (701) a plurality of downmix channel signals (203) from the multi-channel input signal (201) and performing (702) energy compaction of the plurality of downmix channel signals (203) to provide a plurality of compacted channel signals (404). Furthermore, the method (700) comprises determining (703) joint coding metadata (205) based on the plurality of compacted channel signals (404) and based on the multi-channel input signal (201), wherein the joint coding metadata (205) is such that it allows upmixing of the plurality of compacted channel signals (404) to an approximation of the multi-channel input signal (201). In addition, the method (700) comprises encoding (704) the plurality of compacted channel signals (404) and the joint coding metadata (205).
LAYERED CODING FOR COMPRESSED SOUND OR SOUND FIELD REPRESENTATIONS
The present document relates to a method of layered encoding of a compressed sound representation of a sound or sound field. The compressed sound representation comprises a basic compressed sound representation comprising a plurality of components, basic side information for decoding the basic compressed sound representation to a basic reconstructed sound representation of the sound or sound field, and enhancement side information including parameters for improving the basic reconstructed sound representation. The method comprises sub-dividing the plurality of components into a plurality of groups of components and assigning each of the plurality of groups to a respective one of a plurality of hierarchical layers, the number of groups corresponding to the number of layers, and the plurality of layers including a base layer and one or more hierarchical enhancement layers, adding the basic side information to the base layer, and determining a plurality of portions of enhancement side information from the enhancement side information and assigning each of the plurality of portions of enhancement side information to a respective one of the plurality of layers, wherein each portion of enhancement side information includes parameters for improving a reconstructed sound representation obtainable from data included in the respective layer and any layers lower than the respective layer. The document further relates to a method of decoding a compressed sound representation of a sound or sound field, wherein the compressed sound representation is encoded in a plurality of hierarchical layers that include a base layer and one or more hierarchical enhancement layers, as well as to an encoder and a decoder for layered coding of a compressed sound representation.
SPATIAL AUDIO SERVICE
A method, apparatus and computer program are provided for assessing a continuity of audio service by comparing a previous audio service with a first spatial audio service that uses user head-tracking and a second audio service to identify which of the first or second spatial audio service provides a continuity of audio service with respect to the previous audio service. If the first or second spatial audio service is assessed to provide continuity of audio service, the respective spatial audio service is selectively enabled. The first spatial audio service controls or sets at least one directional property of at least one sound source and the first spatial audio service is assessed to provide continuity of audio service if it can use head-tracking to control or set the at least one directional property of at least one sound source.
Audio renderer based on audiovisual information
An audio renderer can have a machine learning model that jointly processes audio and visual information of an audiovisual recording. The audio renderer can generate output audio channels. Sounds captured in the audiovisual recording and present in the output audio channels are spatially mapped based on the joint processing of the audio and visual information by the machine learning model. Other aspects are described.
Method and device for processing audio signal, using metadata
Disclosed is a device for processing an audio signal, which renders an audio signal. The device for processing an audio signal includes a processor. The processor receives metadata including an audio signal and first element reference distance information and renders a first element signal on the basis of the first element reference distance information, wherein the first element reference distance information indicates the reference distance of an element signal. The audio signal is capable of including a second element signal which may be simultaneously rendered with the first element signal, and the metadata is capable of including second element distance information indicating the distance of the second element signal. The number of bits required for representing the first element reference distance information is smaller than the number of bits required for representing the second element distance information.