H04S5/005

Video-informed Spatial Audio Expansion
20230305800 · 2023-09-28 ·

First video frames that include a visual object and a non-spatialized first audio segment that includes an auditory event are received. If that second video frames do not include the visual object and a first time difference between the first video frames and the second video frames does not exceed a certain time, a motion vector of the visual object is used to assign a spatial location to the auditory event in at least one of the second video frames. A second audio segment that includes the auditory event and third video frames are received. If the third video frames do not include the visual object and a second time difference between the first video frames and the third video frames exceeds the certain time, the auditory event is assigned to a diffuse sound field. An audio output that conveys spatial locations of the visual object is output.

Parametric reconstruction of audio signals

An encoding system encodes an N-channel audio signal (X), wherein N≥3, as a single-channel downmix signal (Y) together with dry and wet upmix parameters ({tilde over (C)}, {tilde over (P)}). In a decoding system, a decorrelating section outputs, based on the downmix signal, an (N−1)-channel decorrelated signal (Z); a dry upmix section maps the downmix signal linearly in accordance with dry upmix coefficients (C) determined based on the dry upmix parameters; a wet upmix section populates an intermediate matrix based on the wet upmix parameters and knowing that the intermediate matrix belongs to a predefined matrix class, obtains wet upmix coefficients (P) by multiplying the intermediate matrix by a predefined matrix, and maps the decorrelated signal linearly in accordance with the wet upmix coefficients; and a combining section combines outputs from the upmix sections to obtain a reconstructed signal ({circumflex over (X)}) corresponding to the signal to be reconstructed.

Systems and methods for spatial audio rendering

Systems and methods for rendering spatial audio in accordance with embodiments of the invention are illustrated. One embodiment includes a spatial audio system, including a primary network connected speaker, including a plurality of sets of drivers, where each set of drivers is oriented in a different direction, a processor system, memory containing an audio player application, wherein the audio player application configures the processor system to obtain an audio source stream from an audio source via the network interface, spatially encode the audio source, decode the spatially encoded audio source to obtain driver inputs for the individual drivers in the plurality of sets of drivers, where the driver inputs cause the drivers to generate directional audio.

Renderer controlled spatial upmix

An audio decoder device for decoding a compressed input audio signal having at least one core decoder having one or more processors for generating a processor output signal based on a processor input signal, wherein a number of output channels of the processor output signal is higher than a number of input channels of the processor input signal, wherein each of the one or more processors has a decorrelator and a mixer, wherein a core decoder output signal having a plurality of channels has the processor output signal, and wherein the core decoder output signal is suitable for a reference loudspeaker setup; at least one format converter device configured to convert the core decoder output signal into an output audio signal, which is suitable for a target loudspeaker setup; and a control device configured to control at least one or more processors in such way that the decorrelator of the processor may be controlled independently from the mixer of the processor, wherein the control device is configured to control at least one of the decorrelators of the one or more processors depending on the target loudspeaker setup.

METHOD FOR GENERATING SOUND AND DEVICES FOR PERFORMING SAME
20220021998 · 2022-01-20 ·

Disclosed are a method for generating a sound, and devices for performing same. The method for generating a sound, according to one embodiment, includes the acquiring a real sound generated in a real space, and a play sound generated in a virtual space, and generating, by combining the real sound and the play sound, a combined sound generated in a mixed reality in which the real space and the virtual space are mixed.

MULTI-CHANNEL IN-VEHICLE SOUND SYSTEM
20230300551 · 2023-09-21 ·

The disclosure relates to a multi-channel in-vehicle sound system, comprising: an audio processing device, a power amplifier, and a plurality of speakers, wherein the plurality of speakers comprise: a central speaker arranged in the center of the front of a vehicle cabin; a left speaker arranged in the left front of the vehicle cabin; a right speaker arranged in the right front of the vehicle cabin; a left surround speaker arranged on the left of the vehicle cabin; a right surround speaker arranged on the right of the vehicle cabin; a left front 3D speaker and a right front 3D speaker arranged on two sides of a roof and corresponding to the positions of front seats. for any one of the plurality of speakers, an azimuth compensation for the speaker is made by means of the speakers located around the speaker. According to the multi-channel in-vehicle sound system of the present disclosure, a more immersive sound field can be is provided, a better sound experience can be achieved, and the timbre can be better restored.

Data processing method and apparatus, acquisition device, and storage medium
11223923 · 2022-01-11 · ·

Disclosed is a data processing method, comprising: receiving M channels of encoded audio data; decoding the M channels of encoded audio data, to acquire space information of an audio corresponding to the M channels of audio data, wherein M is a positive integer; determining Q speaker devices corresponding to the M channels of audio data according to the acquired space information of the audio and position information of the speaker devices, wherein Q is a positive integer; and rendering the M channels of audio data with the determined Q speaker devices. Embodiments of the present invention further provide an acquisition device, a data processing device, and a storage medium.

Methods and Apparatus for Rendering Audio Objects

Multiple virtual source locations may be defined for a volume within which audio objects can move. A set-up process for rendering audio data may involve receiving reproduction speaker location data and pre-computing gain values for each of the virtual sources according to the reproduction speaker location data and each virtual source location. The gain values may be stored and used during “run time,” during which audio reproduction data are rendered for the speakers of the reproduction environment. During run time, for each audio object, contributions from virtual source locations within an area or volume defined by the audio object position data and the audio object size data may be computed. A set of gain values for each output channel of the reproduction environment may be computed based, at least in part, on the computed contributions. Each output channel may correspond to at least one reproduction speaker of the reproduction environment.

Methods and systems for extended reality audio processing for near-field and far-field audio reproduction

An exemplary mobile edge compute (“MEC”) server implementing an extended reality audio processing system generates a near-field audio data stream and a far-field audio data stream. The near-field audio data stream is configured to be rendered by a near-field rendering system, while the far-field audio data stream is configured to be rendered by a far-field rendering system. The near-field and far-field audio data streams are each representative of virtual sound presented to an avatar of a user experiencing an extended reality world. The MEC server provides the near-field and far-field audio data streams to a media player device separate from the MEC server and implementing the near-field and far-field rendering systems. Specifically, the MEC server provides the audio data streams for concurrent rendering by the media player device as the user experiences the extended reality world using the media player device. Corresponding methods and systems are also disclosed.

Reordering Of Audio Objects In The Ambisonics Domain
20220030372 · 2022-01-27 ·

In general, disclosed is a device that includes one or more processors, coupled to the memory, configured to perform an energy analysis with respect to one or more audio objects, in the ambisonics domain, in the first time segment. The one or more processors are also configured to perform a similarity measure between the one or more audio objects, in the ambisonics domain, in the first time segment, and the one or more audio objects, in the ambisonics domain, in the second time segment. In addition, the one or more processors are configured to perform a reorder of the one or more audio objects, in the ambisonics domain, in the first time segment with the one or more audio objects, in the ambisonics domain, in the second time segment, to generate one or more reordered audio objects in the first time segment.