Patent classifications
H04S2420/03
Apparatus, a method and a computer program for delivering audio scene entities
In an example embodiment, method, apparatus, and computer program product are provided. The apparatus includes at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: assign one or more audio representations to one or more audio scene entities in an audio scene; generate one or more audio scene entity combinations based on the one or more audio scene entities and the one or more audio representations; and signal the one or more audio scene entity combinations to a client, wherein the one or more audio representations assigned to the one or more audio scene entities cause the client to select an appropriate audio scene entity combination from the one or more audio scene entity combinations to render the audio scene.
NOISE FILLING IN MULTICHANNEL AUDIO CODING
In multichannel audio coding, an improved coding efficiency is achieved by the following measure: the noise filling of zero-quantized scale factor bands is performed using noise filling sources other than artificially generated noise or spectral replica. In particular, the coding efficiency in multichannel audio coding may be rendered more efficient by performing the noise filling based on noise generated using spectral lines from a previous frame of, or a different channel of the current frame of, the multichannel audio signal.
BINAURAL MULTI-CHANNEL DECODER IN THE CONTEXT OF NON-ENERGY-CONSERVING UPMIX RULES
A multi-channel decoder for generating a binaural signal from a downmix signal using upmix rule information on an energy-error introducing upmix rule for calculating a gain factor based on the upmix rule information and characteristics of head related transfer function based filters corresponding to upmix channels. The one or more gain factors are used by a filter processor for filtering the downmix signal so that an energy corrected binaural signal having a left binaural channel and a right binaural channel is obtained.
HYBRID, PRIORITY-BASED RENDERING SYSTEM AND METHOD FOR ADAPTIVE AUDIO
Embodiments are directed to a method of rendering adaptive audio by receiving input audio comprising channel-based audio, audio objects, and dynamic objects, wherein the dynamic objects are classified as sets of low-priority dynamic objects and high-priority dynamic objects, rendering the channel-based audio, the audio objects, and the low-priority dynamic objects in a first rendering processor of an audio processing system, and rendering the high-priority dynamic objects in a second rendering processor of the audio processing system. The rendered audio is then subject to virtualization and post-processing steps for playback through soundbars and other similar limited height capable speakers.
SYSTEMS AND METHODS FOR RULE-BASED USER CONTROL OF AUDIO RENDERING
A sound processing system includes a sound input device for providing a sound input, a sound output device for providing a sound output, and processing electronics including a processor and a memory, wherein the processing electronics is configured to receive a target sound input identifying a target sound, receive a rule input establishing a sound processing rule that references the target sound, receive a sound input from the sound input device, analyze the sound input for the target sound, process the sound input according to the sound processing rule in view of the analysis of the sound input, and provide a processed sound output to the sound output device.
AUDIO DECODING USING INTERMEDIATE SAMPLING RATE
A method for processing a signal includes receiving a first frame of an input audio bitstream at a decoder. The first frame includes at least one signal associated with a frequency range. The method also includes decoding the at least one signal to generate at least one decoded signal having an intermediate sampling rate. The intermediate sampling rate is based on coding information associated with the first frame. The method further includes generating a resampled signal based at least in part on the at least one decoded signal. The resampled signal has an output sampling rate of the decoder.
Efficient coding of audio scenes comprising audio objects
There is provided encoding and decoding methods for encoding and decoding of object based audio. An exemplary encoding method includes inter alia calculating M downmix signals by forming combinations of N audio objects, wherein M≦N, and calculating parameters which allow reconstruction of a set of audio objects formed on basis of the N audio objects from the M downmix signals. The calculation of the M downmix signals is made according to a criterion which is independent of any loudspeaker configuration.
Method for generating filter for audio signal, and parameterization device for same
The present invention relates to a method for generating a filter for an audio signal and a parameterization device for the same, and more particularly, to a method for generating a filter for an audio signal, to implement filtering of an input audio signal with a low computational complexity, and a parameterization device therefor. To this end, provided are a method for generating a filter for an audio signal, including: receiving at least one binaural room impulse response (BRIR) filter coefficients for binaural filtering of an input audio signal; converting the BRIR filter coefficients into a plurality of subband filter coefficients; obtaining average reverberation time information of a corresponding subband by using reverberation time information extracted from the subband filter coefficients; obtaining at least one coefficient for curve fitting of the obtained average reverberation time information; obtaining flag information indicating whether the length of the BRIR filter coefficients in a time domain is more than a predetermined value; obtaining filter order information for determining a truncation length of the subband filter coefficients, the filter order information being obtained by using the average reverberation time information or the at least one coefficient according to the obtained flag information and the filter order information of at least one subband being different from filter order information of another subband; and truncating the subband filter coefficient by using the obtained filter order information and a parameterization device therefor.
Rendering audio objects having apparent size
Methods, systems, and computer program products for rending an audio object having an apparent size are disclosed. An audio processing system receives audio panning data including a first grid mapping first virtual sound sources in a space and speaker positions to speaker gains. The first grid specifies first speaker gains of the first virtual sound sources in the space. The audio processing system determines a second grid of second virtual sound sources in the space, including mapping the first virtual sound sources into the second virtual sound sources of the second virtual sources. The audio processing system selects at least one of the first grid or second grid for rendering an audio object based on an apparent size of the audio object. The audio processing system renders the audio object based on the selected grid or grids.
AUDIO RENDERING USING 6-DOF TRACKING
The methods and apparatus described herein optimally represent full 3D audio mixes (e.g., azimuth, elevation, and depth) as “sound scenes” in which the decoding process facilitates head tracking. Sound scene rendering can be performed for the listener's orientation (e.g., yaw, pitch, roll) and 3D position (e.g., x, y, z), and can be modified for a change in the listener's orientation or 3D position. As described below, the ability to render an audio object in both the near-field and far-field enables the ability to fully render depth of not just objects, but any spatial audio mix decoded with active steering/panning, such as Ambisonics, matrix encoding, etc., thereby enabling full translational head tracking (e.g., user movement) beyond simple rotation in the horizontal plane, or 6-degrees-of-freedom (6-DOF) tracking and rendering.