H04S2420/03

Spatial audio encoding and reproduction of diffuse sound

A method and apparatus processes multi-channel audio by encoding, transmitting or recording “dry” audio tracks or “stems” in synchronous relationship with time-variable metadata controlled by a content producer and representing a desired degree and quality of diffusion. Audio tracks are compressed and transmitted in connection with synchronized metadata representing diffusion and preferably also mix and delay parameters. The separation of audio stems from diffusion metadata facilitates the customization of playback at the receiver, taking into account the characteristics of local playback environment.

Audio processing

An audio processing system (100) for spatial synthesis comprises an upmix stage (110) receiving a decoded m-channel downmix signal (X) and outputting, based thereon, an n-channel upmix signal (Y), wherein 2≦m<n. The upmix stage comprises a downmix modifying processor (120), which receives the m-channel downmix signal and outputting a modified downmix signal (d.sub.1, d.sub.2) obtained by cross mixing and non-linear processing of the downmix signal, and further comprises a first mixing matrix (130) receiving the downmix signal and the modified downmix signal, forming an n-channel linear combination of the downmix signal channels and modified downmix signal channels only and outputting this as the n-channel upmix signal. In an embodiment, the first mixing matrix accepts one or more mixing parameters (g, α.sub.1, . . . ) controlling at least one gain in the linear combination performed by the first mixing matrix. The gains are polynomials of degree ≦2.

METADATA-PRESERVED AUDIO OBJECT CLUSTERING

Example embodiments disclosed herein relate to audio object clustering. A method for metadata-preserved audio object clustering is disclosed. The method comprises classifying an audio object into at least a category based rendering mode information metadata. The method further comprises assigning a predetermined number of clusters to the categories and rendering the audio object based on the rendering mode. Corresponding system and computer program product are also disclosed.

SPATIAL AUDIO SIGNAL MANIPULATION

Described herein is a method (30) of rendering an audio signal (17) for playback in an audio environment (27) defined by a target loudspeaker system (23), the audio signal (17) including audio data relating to an audio object and associated position data indicative of an object position. Method (30) includes the initial step (31) of receiving the audio signal (17). At step (32) loudspeaker layout data for the target loudspeaker system (23) is received. At step (33) control data is received that is indicative of a position modification to be applied to the audio object in the audio environment (27). At step (38) in response to the position data, loudspeaker layout data and control data, rendering modification data is generated. Finally, at step (39) the audio signal (17) is rendered with the rendering modification data to output the audio signal (17) with the audio object at a modified object position that is between loudspeakers within the audio environment (27).

Apparatus and method for reproducing recorded audio with correct spatial directionality

An apparatus comprising: an input configured to receive from at least one co-operating apparatus at least one audio signal; an audio signal analyzer configured to analyze the at least one audio signal to determine at least one audio component position relative to the at least one co-operating apparatus recording position; and a processor configured to determine an position value based on the at least one cooperating recording position and the apparatus position, and further configured to apply the position value to the at least one audio component position, such that the at least one audio component position is substantially aligned with the apparatus position.

Audio decoder for interleaving signals

A method for decoding an encoded audio bitstream in an audio processing system is disclosed. The method includes extracting from the encoded audio bitstream a first waveform-coded signal including spectral coefficients corresponding to frequencies up to a first cross-over frequency and performing parametric decoding at a second cross-over frequency to generate a reconstructed signal. The second cross-over frequency is above the first cross-over frequency and the parametric decoding uses reconstruction parameters derived from the encoded audio bitstream to generate the reconstructed signal. The method further includes extracting from the encoded audio bitstream a second waveform-coded signal including spectral coefficients corresponding to a subset of frequencies above the first cross-over frequency and interleaving the second waveform-coded signal with the reconstructed signal to produce an interleaved signal. The interleaved signal is then combined with the first waveform-coded signal.

Spatial audio processing

According to an example embodiment, a method for processing a multi-channel input audio signal representing a sound field into a multi-channel output audio signal representing said sound field in accordance with a predefined loudspeaker layout is provided, the method comprising the following for at least one frequency band: obtaining spatial audio parameters that are descriptive of spatial characteristics of said sound field; estimating a signal energy of the sound field represented by the multi-channel input audio signal; estimating, based on said signal energy and the obtained spatial audio parameters, respective output signal energies for channels of the multi-channel output audio signal according to said predefined loudspeaker layout; determining a maximum output energy as the largest of the output signal energies across channels of said multi-channel output audio signal; and deriving, on basis of said maximum output energy, a gain value for adjusting sound reproduction gain in at least one of said channels of the multi-channel output audio signal.

Multi-channel decorrelator, multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a premix of decorrelator input signals

A multi-channel decorrelator for providing a plurality of decorrelated signals on the basis of a plurality of decorrelator input signals is configured to premix a first set of N decorrelator input signals into a second set of K decorrelator input signals, wherein K<N. The multi-channel decorrelator is configured to provide a first set of K′ decorrelator output signals on the basis of the second set of K decorrelator input signals. The multi-channel decorrelator is further configured to upmix the first set of K′ decorrelator output signals into a second set of N′ decorrelator output signals, wherein N′>K′. The multi-channel decorrelator can be used in a multi-channel audio decoder. A multi-channel audio encoder provides complexity control information for the multi-channel decorrelator.

AUGMENTED REALITY HEADPHONE ENVIRONMENT RENDERING
20170223478 · 2017-08-03 ·

Accurate modeling of acoustic reverberation can be essential to generating and providing a realistic virtual reality or augmented reality experience for a participant. In an example, a reverberation signal for playback using headphones can be provided. The reverberation signal can correspond to a virtual sound source signal originating at a specified location in a local listener environment. Providing the reverberation signal can include, among other things, using information about a reference impulse response from a reference environment and using characteristic information about reverberation decay in a local environment of the participant. Providing the reverberation signal can further include using information about a relationship between a volume of the reference environment and a volume of the local environment of the participant.

Encoding and decoding methods, and encoding and decoding apparatuses for stereo signal

This disclosure provides an encoding method, a decoding method, an encoding apparatus, and a decoding apparatus for a stereo signal. The encoding method includes: performing interpolation processing based on the inter-channel time difference in the current frame and an inter-channel time difference in a previous frame of the current frame; performing time-domain downmixing processing on the stereo signal after the delay alignment in the current frame, to obtain a primary-channel signal and a secondary-channel signal in the current frame; and quantizing the inter-channel time difference after the interpolation processing in the current frame, the primary channel signal and the secondary channel signal.