Patent classifications
H04S2420/11
Streaming binaural audio from a cloud spatial audio processing system to a mobile station for playback on a personal audio delivery device
Spatial audio is received from an audio server over a first communication link. The spatial audio is converted by a cloud spatial audio processing system into binaural audio. The binauralized audio is streamed from the cloud spatial audio processing system to a mobile station over a second communication link to cause the mobile station to play the binaural audio on the personal audio delivery device.
Sound recording apparatus, sound system, sound recording method, and carrier means
An apparatus, system, and method, each of which: acquires sound data generated from a plurality of sound signals collected at a plurality of microphones; acquires, from one or more sensors, a result of detecting a position of the sound recording apparatus at a time point during a time period when the plurality of sound signals is collected; and stores, in a memory, position data indicating the position of the sound recording apparatus detected at the time point, and sound data generated based on a plurality of sound signals collected at the microphones at the time point at which the position was detected, in association with each other.
METHOD AND DEVICE FOR PROCESSING AUDIO SIGNAL, USING METADATA
Disclosed is a device for processing an audio signal, which renders an audio signal. The device for processing an audio signal includes a processor. The processor receives metadata including an audio signal and first element reference distance information and renders a first element signal on the basis of the first element reference distance information, wherein the first element reference distance information indicates the reference distance of an element signal. The audio signal is capable of including a second element signal which may be simultaneously rendered with the first element signal, and the metadata is capable of including second element distance information indicating the distance of the second element signal. The number of bits required for representing the first element reference distance information is smaller than the number of bits required for representing the second element distance information.
Time domain neural networks for spatial audio reproduction
A device for reproducing spatial audio using a machine learning model may include at least one processor configured to receive multiple audio signals corresponding to a sound scene captured by respective microphones of a device. The at least one processor may be further configured to provide the multiple audio signals to a machine learning model, the machine learning model having been trained based at least in part on a target rendering configuration. The at least one processor may be further configured to provide, responsive to providing the multiple audio signals to the machine learning model, multichannel audio signals that comprise a spatial reproduction of the sound scene in accordance with the target rendering configuration.
Method and Apparatus for Communication Audio Handling in Immersive Audio Scene Rendering
An apparatus for rendering communication audio signal within an immersive audio scene, the apparatus comprising means configured to: obtain at least one spatial audio signal for rendering within the immersive audio scene; obtain the communication audio signal and positional information associated with the communication audio signal; obtain a rendering processing parameter associated with the communication audio signal; determine a rendering method based on the rendering processing parameter; determine an insertion point in a rendering processing for the determined rendering method and/or a selection of rendering elements for the determined rendering method based on the rendering processing parameter.
Audio processing method and apparatus
An audio processing method includes: M first audio signals are obtained by processing a to-be-processed audio signal by M first virtual speakers; N second audio signals are obtained by processing the to-be-processed audio signal by N second virtual speakers; M first head-related transfer functions (HRTFs) centered at a left ear position and N second HRTFs centered at a right ear position are obtained; a first target audio signal is obtained based on the M first audio signals and the M first HRTFs; and a second target audio signal is obtained based on the N second audio signals and the N second HRTFs.
METHOD AND DEVICE FOR DECODING A HIGHER-ORDER AMBISONICS (HOA) REPRESENTATION OF AN AUDIO SOUNDFIELD
The invention discloses rendering sound field signals, such as Higher-Order Ambisonics (HOA), for arbitrary loudspeaker setups, where the rendering results in highly improved localization properties and is energy preserving. This is obtained by rendering an audio sound field representation for arbitrary spatial loudspeaker setups and/or by a decoder that decodes based on a decode matrix (D). The decode matrix (D) is based on smoothing and scaling of a first decode matrix {circumflex over (D)} with smoothing coefficients. The first decode matrix {circumflex over (D)} is based on a mix matrix G and a mode matrix {tilde over (Ψ)}, where the mix matrix G was determined based on L speakers and positions of a spherical modelling grid related to a HOA order N, and the mode matrix {tilde over (Ψ)} was determined based on the spherical modelling grid and the HOA order N.
Audio Rendering with Spatial Metadata Interpolation
An apparatus circuitry including configured to: obtain two or more audio signal sets, wherein each audio signal set is associated with a position; obtain at least one parameter value for at least two of the audio signal sets; obtain the positions associated with at least the at least two of the audio signal sets; obtain a listener position; generate at least one audio signal based on at least one audio signal from at least one of the two or more audio signal sets based on the positions associated with the at least the at least two of the audio signal sets and the listener position; generate at least one modified parameter value based on the obtained at least one parameter value for the at least two of the audio signal sets, the positions associated with the at least two of the audio signal sets and the listener position; and process the at least one audio signal based on the at least one modified parameter value to generate a spatial audio output.
Spatial audio parameters
An apparatus including circuitry configured for: defining at least one parameter field associated with an input multi-channel audio signals, the at least one parameter field configured to describe at least one characteristic of the multi-channel audio signals; determining at least one spatial audio parameter associated with the multi-channel audio signals; and controlling a rendering of the multi-channel audio signals by processing the input multichannel audio signals using at least the at least one characteristic of the multi-channel audio signals and the at least one spatial audio parameter.
SYSTEM AND METHOD FOR INTERPOLATING A HEAD-RELATED TRANSFER FUNCTION
This disclosure describes a system and method for Head-Related Transfer Function (HRTF) interpolation when an HRTF dataset does not contain a particular direction associated with a desired source. The disclosed HRTF interpolation uses a finite set of HRTFs from a dataset to obtain the HRTF of any possible direction and distance, even if the direction/distance doesn't exist on the current dataset.