H04S2420/07

METHOD AND SYSTEM FOR INSTRUMENT SEPARATING AND REPRODUCING FOR MIXTURE AUDIO SOURCE

A method and a system for instrument separating and reproducing for a mixture audio source is provided. The method and/or the system includes inputting selected music into an instrument separation model for extracting features therefrom, determining audio source signals of multiple channels for the separation of all instruments, each channel containing sound of one instrument, and transmitting the signals of the different channels to multiple speakers placed at designated positions for playing, which can reproduce or recreate an immersive sound field listening experience for users.

DEVICE AND METHOD FOR CALCULATING LOUDSPEAKER SIGNALS FOR A PLURALITY OF LOUDSPEAKERS WHILE USING A DELAY IN THE FREQUENCY DOMAIN
20180012612 · 2018-01-11 ·

A device for calculating loudspeaker signals for a plurality of loudspeakers while using a plurality of audio sources, an audio source including an audio signal, includes a forward transform stage for transforming each audio signal, block-by-block, to a spectral domain so as to obtain for each audio signal a plurality of temporally consecutive short-term spectra, a memory for storing a plurality of temporally consecutive short-term spectra for each audio signal, a memory access controller for accessing a specific short-term spectrum among the plurality of short-term spectra for a combination consisting of a loudspeaker and an audio signal on the basis of a delay value, a filter stage for filtering the specific short-term spectrum for the combination of the audio signal and the loudspeaker by using a filter provided for the combination of the audio signal and the loudspeaker, so that a filtered shot-term spectrum is obtained for each combination of an audio signal and a loudspeaker, a summing stage for summing up the filtered short-term spectra for a loudspeaker so as to obtain summed-up short-term spectra for each loudspeaker, and a backtransform stage for backtransforming, block-by-block, summed-up short-term spectra for the loudspeakers to a time domain so as to obtain the loudspeaker signals.

Apparatus, Method, or Computer Program for Processing an Encoded Audio Scene using a Parameter Conversion

An apparatus for processing an encoded audio scene representing a sound field related to a virtual listener position, the encoded audio scene including information on a transport signal and a first set of parameters related to the virtual listener position includes a parameter converter for converting the first set of parameters into a second set of parameters related to a channel representation including two or more channels for a reproduction at predefined spatial positions for the two or more channels, and an output interface for generating a processed audio scene using the second set of parameters and the information on the transport signal.

APPARATUS AND METHOD FOR ENCODING A PLURALITY OF AUDIO OBJECTS USING DIRECTION INFORMATION DURING A DOWNMIXING OR APPARATUS AND METHOD FOR DECODING USING AN OPTIMIZED COVARIANCE SYNTHESIS

An apparatus for encoding a plurality of audio objects and related metadata indicating direction information on the plurality of audio objects has: a downmixer for downmixing the plurality of audio objects to obtain one or more transport channels; a transport channel encoder for encoding one or more transport channels to obtain one or more encoded transport channels; and an output interface for outputting an encoded audio signal comprising the one or more encoded transport channels, wherein the downmixer is configured to downmix the plurality of audio objects in response to the direction information on the plurality of audio objects.

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
20230007232 · 2023-01-05 ·

Provided is an information processing device that performs processing on a content. An information processing device is provided with an estimation unit that estimates sounding coordinates at which a sound image is generated on the basis of a video stream and an audio stream, a video output control unit that controls an output of the video stream, and an audio output control unit that controls an output of the audio stream so as to generate the sound image at the sounding coordinates. A discrimination unit that discriminates a gazing point of a user who views video and audio is further provided, in which the estimation unit estimates the sounding coordinates at which the sound image of the object gazed by the user is generated on the basis of a discrimination result.

COLORLESS GENERATION OF ELEVATION PERCEPTUAL CUES USING ALL-PASS FILTER NETWORKS
20230025801 · 2023-01-26 ·

A system includes one or more computing devices that encode spatial perceptual cues into a monaural channel to generate a plurality of output channels. A computing device determines a target amplitude response for the mid and side channels of the plurality of output channels, defining a spatial perceptual associated with one or more frequency-dependent phase shifts. The computing device determines a transfer function of a single-input, multi-output allpass filter based on the target amplitude response and determines coefficients of the allpass filter based on the transfer function, and processes the monaural channel with the coefficients of the allpass filter to generate the plurality of channels having the encoded spatial perceptual cues. The allpass filter is configured to be colorless with respect to the individual output channels, allowing for the placement of spatial cues into the audio stream to be decoupled from the overall coloration of the audio.

Playback Device Configuration
20230232175 · 2023-07-20 ·

Examples described herein involve configuring a playback device based on distortion, such as that caused by a barrier. One implementation may involve causing the playback device to play audio content according to an existing playback configuration, determining an existing frequency response of the playback device in a given system, and determining whether a difference between the existing frequency response of the playback device in the given system and a predetermined frequency response for the playback device is greater than a predetermined distortion threshold. If it is determined that the difference between the existing frequency response of the playback device and the predetermined frequency response for the playback device is greater than the predetermined distortion threshold, then the existing playback configuration of the playback device is changed to an updated playback configuration of the playback device and the playback device plays audio content according to the updated playback configuration.

Methods and system for adjusting level of tactile content when presenting audio content

An audio system presented herein includes a transducer array, a sensor array, and a controller. The transducer array presents audio content to a user. The controller controls the transducer array to adjust a level of tactile content imparted to the user via actuation of at least one transducer in the transducer array while presenting the audio content to the user. The audio system can be part of a headset.

Efficient coding of audio scenes comprising audio objects

There is provided encoding and decoding methods for encoding and decoding of object based audio. An exemplary encoding method includes inter alia calculating M downmix signals by forming combinations of N audio objects, wherein M≤N, and calculating parameters which allow reconstruction of a set of audio objects formed on basis of the N audio objects from the M downmix signals. The calculation of the M downmix signals is made according to a criterion which is independent of any loudspeaker configuration.

Audio decoder and decoding method

A method for representing a second presentation of audio channels or objects as a data stream, the method comprising the steps of: (a) providing a set of base signals, the base signals representing a first presentation of the audio channels or objects; (b) providing a set of transformation parameters, the transformation parameters intended to transform the first presentation into the second presentation; the transformation parameters further being specified for at least two frequency bands and including a set of multi-tap convolution matrix parameters for at least one of the frequency bands.