H04S2400/01

Spatial audio data exchange

A device includes one or more processors configured to execute instructions to obtain, at a first audio output device, first spatial audio data and a first reference time associated, and to cause the first reference time and data representing at least a portion of the first spatial audio data to be transmitted from the first audio output device. The instructions further cause the one or more processors to receive, at the first audio output device from a second audio output device, second spatial audio data and a second reference time. The instructions further cause the one or more processors to, based on the first reference time and the second reference time, time-align the first spatial audio data and the second spatial audio data to generate combined audio data representing a three-dimensional (3D) sound field and to generate audio output based on the combined audio data.

METHODS FOR PARAMETRIC MULTI-CHANNEL ENCODING

The present document relates to audio coding systems. In particular, the present document relates to efficient methods and systems for parametric multi-channel audio coding. An audio encoding system configured to generate a bitstream indicative of a downmix signal and spatial metadata for generating a multi-channel upmix signal from the downmix signal is described. The system comprises a downmix processing unit configured to generate the downmix signal from a multi-channel input signal; wherein the downmix signal comprises m channels and wherein the multi-channel input signal comprises n channels; n, m being integers with m<n. Furthermore, the system comprises a parameter processing unit configured to determine the spatial metadata from the multi-channel input signal. In addition, the system comprises a configuration unit configured to determine one or more control settings for the parameter processing unit based on one or more external settings; wherein the one or more external settings comprise a target data-rate for the bitstream and wherein the one or more control settings comprise a maximum data-rate for the spatial metadata.

Delayed audio following
11477599 · 2022-10-18 · ·

Disclosed herein are systems and methods for presenting mixed reality audio. In an example method, audio is presented to a user of a wearable head device. A first position of the user's head at a first time is determined based on one or more sensors of the wearable head device. A second position of the user's head at a second time later than the first time is determined based on the one or more sensors. An audio signal is determined based on a difference between the first position and the second position. The audio signal is presented to the user via a speaker of the wearable head device. Determining the audio signal comprises determining an origin of the audio signal in a virtual environment. Presenting the audio signal to the user comprises presenting the audio signal as if originating from the determined origin. Determining the origin of the audio signal comprises applying an offset to a position of the user's head.

SPATIAL AUDIO FOR INTERACTIVE AUDIO ENVIRONMENTS

Systems and methods of presenting an output audio signal to a listener located at a first location in a virtual environment are disclosed. According to embodiments of a method, an input audio signal is received. A first intermediate audio signal corresponding to the input audio signal is determined, based on a location of the sound source in the virtual environment, and the first intermediate audio signal is associated with a first bus. A second intermediate audio signal is determined. The second intermediate audio signal corresponds to a reverberation of the input audio signal in the virtual environment. The second intermediate audio signal is determined based on a location of the sound source, and further based on an acoustic property of the virtual environment. The second intermediate audio signal is associated with a second bus. The output audio signal is presented to the listener via the first and second buses.

METHODS, APPARATUS AND SYSTEMS FOR 6DOF AUDIO RENDERING AND DATA REPRESENTATIONS AND BITSTREAM STRUCTURES FOR 6DOF AUDIO RENDERING

The present disclosure relates to methods, apparatus and systems for encoding an audio signal into a bitstream, in particular at an encoder, comprising: encoding or including audio signal data associated with 3DoF audio rendering into one or more first bitstream parts of the bitstream, and encoding or including metadata associated with 6DoF audio rendering into one or more second bitstream parts of the bitstream. The present disclosure further relates to methods, apparatus and systems for decoding an audio signal and audio rendering based on the bitstream.

DUAL LISTENER POSITIONS FOR MIXED REALITY
20230065046 · 2023-03-02 ·

A method of presenting audio comprises: identifying a first ear listener position and a second ear listener position in a mixed reality environment; identifying a first virtual sound source in the mixed reality environment; identifying a first object in the mixed reality environment; determining a first audio signal in the mixed reality environment, wherein the first audio signal originates at the first virtual sound source and intersects the first ear listener position; determining a second audio signal in the mixed reality environment, wherein the second audio signal originates at the first virtual sound source, intersects the first object, and intersects the second ear listener position; determining a third audio signal based on the second audio signal and the first object; presenting, to a first ear of a user, the first audio signal; and presenting, to a second ear of the user, the third audio signal.

Sound Field Related Rendering
20220328056 · 2022-10-13 ·

An apparatus including circuitry configured to obtain a defocus direction; process a spatial audio signal that represents an audio scene to generate a processed spatial audio signal that represents a modified audio scene based on the defocus direction, so as to control relative deemphasis in, at least in part, a portion of the spatial audio signal in the defocus direction relative to at least in part other portions of the spatial audio signal; and output the processed spatial audio signal, wherein the modified audio scene based on the defocus direction enables the deemphasis in, at least in part, the portion of the spatial audio signal in the defocus direction relative to at least in part other portions of the spatial audio signal.

Method and Apparatus for Low Complexity Low Bitrate 6DOF HOA Rendering

An apparatus for generating an immersive audio scene, the apparatus including circuitry configured to: obtain audio scene based sources, the audio scene based sources are associated with one or more positions in an audio scene, wherein each audio scene based source includes at least one spatial parameter and at least one audio signal; determine at least one position associated with at least one of the audio scene based sources; generate at least one audio source based on the determined at least one position, wherein the circuitry is configured to: generate at least one spatial audio parameter; and generate at least one audio source signal; and generate information about a relationship between the generated at least one spatial audio parameter and the at least one audio signals and the generated at least one audio source is selected based on a renderer preference.

Apparatus and method for audio rendering employing a geometric distance definition

An apparatus for playing back an audio object associated with a position includes a distance calculator for calculating distances of the position to speakers or for reading the distances of the position to the speakers. The distance calculator is configured to take a solution with a smallest distance. The apparatus is configured to play back the audio object using the speaker corresponding to the solution.

Immersive media with media device
11632642 · 2023-04-18 · ·

Aspects of the subject disclosure may include, for example, a method, comprising: receiving, by a media processor including a processor, spherical audiovisual media content from a content delivery network; rendering, by the media processor, video for a point of view in the spherical audiovisual media content at a display device coupled to the media processor; receiving, from a remote control device coupled to the media processor, a control signal panning the point of view, resulting in a new field of view; and generating, by the media processor, audio signals from the spherical audiovisual media content corresponding to the new field of view, wherein the audio signals are adapted to audio reproduction equipment coupled to the media processor. Other embodiments are disclosed.