Patent classifications
H04S2420/07
Method, device and software for controlling transport of audio data
A method for processing music audio data, including providing input audio data representing a first piece of music comprising a mixture of musical timbres. The method also includes decomposing the input audio data to generate at least first-timbre decomposed data representing a first timbre selected from the musical timbres of the first piece of music, and second-timbre decomposed data representing a second timbre selected from the musical timbres of the first piece of music. The method also includes applying a transport control to obtain transport controlled first-timbre decomposed data. The method also includes recombining audio data obtained from the transport controlled first-timbre decomposed data with audio data obtained from the second-timbre decomposed data to obtain recombined audio data.
Binaural decoder to output spatial stereo sound and a decoding method thereof
A binaural decoder for an MPEG surround stream, which decodes an MPEG surround stream into a stereo 3D signal, and a decoding method thereof. The method includes dividing a compressed audio stream and head related transfer function (HRTF) data into subbands, selecting predetermined subbands of the HRTF data divided into subbands and filtering the HRTF data to obtain the selected subbands, decoding the audio stream divided into subbands into a stream of multi-channel audio data with respect to subbands according to spatial additional information, and binaural-synthesizing the HRTF data of the selected subbands with the multi-channel audio data of corresponding subbands.
Spatial audio processing
According to an example embodiment, a technique for spatial audio processing on basis of two or more input audio signals that represent an audio scene and at least one further input audio signal that represents at least part of the audio scene is provided, the technique including identifying a portion of interest (POI) in the audio scene; processing the two or more input audio signals into a spatial audio signal where the POI in the audio scene is suppressed; generating, on basis of the at least one further input audio signal, a complementary audio signal that represents the POI in the audio scene; and combining the complementary audio signal with the spatial audio signal to create a reconstructed spatial audio signal.
Spatial audio signal format generation from a microphone array using adaptive capture
Apparatus including a processor configured to: obtain at least two microphone audio signals; determine spatial metadata transmit and/or store the spatial metadata and at least one of: at least one of the at least two microphone audio signals, at least one microphone audio signal from at least one second microphone configured to capture at least part of a same sound scene captured with the at least one first microphone, or at least one signal based, at least partially, on the at least two microphone audio signals, wherein the transmitting and/or storing is configured to enable synthesis of a plurality of spherical harmonic audio signals.
STEREO REPRODUCTION APPARATUS
A right expected value generator generates an expected value of a right channel spectrum from the right channel spectrum. Further, a left expected value generator generates an expected value of a left channel spectrum from the left channel spectrum. Further, a right channel spectrum corrector so corrects a right channel spectrum outputted from a second synthesizer that the right channel spectrum does not exceed the expected value of the right channel spectrum. Moreover, a left channel spectrum corrector so corrects a left channel spectrum outputted from the second synthesizer that the left channel spectrum does not exceed the expected value of the left channel spectrum.
Method for processing of sound signals
A method for processing audio signals for creating a three dimensional sound environment includes: receiving at least one input signal from at least one sound source; creating a simulated signal at least partly based on the received at least one input signal, the simulated signal representing a simulation of at least one input signal reflecting from the ground or a floor; and creating an output signal at least partly on the basis of the simulated signal and the at least one received input signal, the output signal including a plurality of audio channels; at least two channels of the audio channels of the output signal representing signals for sound transducers above a listener's ear level at a nominal listening position, and at least two channels of the audio channels of the output signal representing signals for sound transducers below a listener's ear level at a nominal listening position.
Voice audio rendering augmentation
An audio rendering device enhances voice audio such that audible voice is not overwhelmed by other aspects of the soundtrack. The device attenuates right and left channels in an audio stream in response to a detected voice component in the audio stream, and boosts the voice component in the audio stream based on the level of attenuation of the right and left channels. Voice components are distinguished from the non-voice components by separating center channel and mono information from the left, right and surround channels. Non-voice components are attenuated down towards a non-voice threshold level based on an attenuation ratio. Voice components are boosted up toward a voice threshold level, so that the spoken voice is more audible to viewers and not overwhelmed or drowned out by the non-voice aspects of the soundtrack.
SIGNAL PROCESSING METHODS AND SYSTEMS FOR RENDERING AUDIO ON VIRTUAL LOUDSPEAKER ARRAYS
Techniques of rendering audio involve applying a balanced-realization state space model to each head-related transfer function (HRTF) to reduce the order of an effective FIR or even an infinite impulse response (IIR) filter. Along these lines, each HRTF G(z) is derived from a head-related impulse response filter (HRIR) via, e.g., a z-transform. The data of the HRIR may be used to construct a first state space representation [A, B, C, D] of the HRTF via the relation .G(z)=C(zI−A).sup.−1B+D This first state space representation is not unique and so for an FIR filter, A and B may be set to simple, binary-valued arrays, while C and D contain the HRIR data. This representation leads to a simple form of a Gramian Q whose eigenvectors provide system states that maximize the system gain as measured by a Hankel norm. Further, a factorization of Q provides a transformation into a balanced state space in which the Gramian is equal to a diagonal matrix of the eigenvalues of Q. By considering only those states associated with an eigenvalue greater than some threshold, the balanced state space representation of the HRTF may be truncated to provide an approximate HRTF that approximates the original HRTF very well while reducing the amount of computation required by as much as 90%.
Systems and methods for providing audio to a user based on gaze input
According to the invention, a method for providing audio to a user is disclosed. The method may include determining, with an eye tracking device, a gaze point of a user on a display. The method may also include causing, with a computer system, an audio device to produce audio to the user, where content of the audio may be based at least in part on the gaze point of the user on the display.
Determination of composite acoustic parameter value for presentation of audio content
Determination of a composite acoustic parameter value for a headset is presented herein. A directionally enhanced audio signal is generated based on audio signals from an acoustic sensor array and a spatial signal enhancement filter that is directed for enhancement of a sound source. A SNR improvement value is determined based on a SNR value of the directionally enhanced audio signal and a SNR value of an audio signal from an acoustic sensor of the acoustic sensor array. The SNR improvement value is input into a model that maps SNR improvement values to spatial acoustic parameters to determine a spatial acoustic parameter. A temporal acoustic parameter is determined based on the audio signals. The composite acoustic parameter value is determined based on the spatial acoustic parameter and a temporal acoustic parameter value. Audio content presented to a user is adjusted based in part on the composite acoustic parameter value.