H04S5/005

AUDIO CHANNEL SPATIAL TRANSLATION

The present invention is directed to methods and apparatus for translating a first plurality of audio input channels to a second plurality of audio output channels. This includes determining that there is pair-wise coding among any of the first plurality of audio input channels, determining an input/output-mapping matrix for mapping at least a first set of the first plurality of audio input channels to at least a second set of the second plurality of audio output channels; and deriving the second plurality of audio output channels based on first plurality of audio input channels, the input/output-mapping matrix and the determined pair-wise coding. The first plurality of audio input channels represent the same soundfield represented by the second plurality of audio output channels.

SYSTEM AND METHOD FOR ADAPTIVE AUDIO SIGNAL GENERATION, CODING AND RENDERING

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

Signal processing device and image display apparatus including the same

Disclosed are a signal processing device and an image display apparatus including the same. The signal processing device and an image display apparatus including the same include: a converter configured to convert of frequency of an input stereo audio signal; a primary component analyzer configured to perform primary component analysis based on a signal from the converter; a feature extractor configured to extract a feature of a primary component signal based on a signal from the primary component analyzer; an envelope adjustor configured to perform envelope adjustment based on prediction performed on the basis of a deep neural network model; and an inverse converter configured to inversely convert a signal from the envelope adjustor to output an upmix audio signal of multi-channel. Accordingly, when upmixing the downmix stereo audio signal to a multichannel audio signal, spatial distortion can be improved.

Video-Informed Spatial Audio Expansion
20210240431 · 2021-08-05 ·

Assigning spatial information to audio segments is disclosed. A method includes receiving a first audio segment that is non-spatialized and is associated with first video frames; identifying visual objects in the first video frames; identifying auditory events in the first audio segment; identifying a match between a visual object of the visual objects and an auditory event of the auditory events; and assigning a spatial location to the auditory event based on a location of the visual object.

AUDIO OUTPUT APPARATUS AND METHOD OF CONTROLLING THEREOF

An audio output apparatus is disclosed. The audio output apparatus that outputs a multi-channel audio signal through a plurality of speakers disposed at different locations, the audio output apparatus includes an input interface, and a processor configured to, based on the multi-channel audio signal input through the inputter being received, obtain scene information on a type of audio included in the multi-channel audio signal and sound image angle information about an angle formed by sound image of the type of audio included in the multi-channel audio signal based on a virtual user, and generate an output signal to be output through the plurality of speakers from the multi-channel audio signal based on the obtained scene information and sound image angle information, wherein the type of audio includes at least one of sound effect, shouting sound, music, and voice, and a number of the plurality of speakers is equal to or greater than a number of channels of the multi-channel audio signal.

Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding

The application relates to audio encoder and decoder systems. An embodiment of the encoder system comprises a downmix stage for generating a downmix signal and a residual signal based on a stereo signal. In addition, the encoder system comprises a parameter determining stage for determining parametric stereo parameters such as an inter-channel intensity difference and an inter-channel cross-correlation. Preferably, the parametric stereo parameters are time- and frequency-variant. Moreover, the encoder system comprises a transform stage. The transform stage generates a pseudo left/right stereo signal by performing a transform based on the downmix signal and the residual signal. The pseudo stereo signal is processed by a perceptual stereo encoder. For stereo encoding, left/right encoding or mid/side encoding is selectable. Preferably, the selection between left/right stereo encoding and mid/side stereo encoding is time- and frequency-variant.

APPARATUS AND METHOD FOR PROVIDING ENHANCED GUIDED DOWNMIX CAPABILITIES FOR 3D AUDIO

An apparatus for downmixing three or more audio input channels to obtain two or more audio output channels is provided. The apparatus includes a receiving interface for receiving the three or more audio input channels and for receiving side information. Moreover, the apparatus includes a downmixer for downmixing the three or more audio input channels depending on the side information to obtain the two or more audio output channels. The number of the audio output channels is smaller than the number of the audio input channels. The side information indicates a characteristic of at least one of the three or more audio input channels, or a characteristic of one or more sound waves recorded within the one or more audio input channels, or a characteristic of one or more sound sources which emitted one or more sound waves recorded within the one or more audio input channels.

Rendering of audio objects with apparent size to arbitrary loudspeaker layouts

Multiple virtual source locations may be defined for a volume within which audio objects can move. A set-up process for rendering audio data may involve receiving reproduction speaker location data and pre-computing gain values for each of the virtual sources according to the reproduction speaker location data and each virtual source location. The gain values may be stored and used during “run time,” during which audio reproduction data are rendered for the speakers of the reproduction environment. During run time, for each audio object, contributions from virtual source locations within an area or volume defined by the audio object position data and the audio object size data may be computed. A set of gain values for each output channel of the reproduction environment may be computed based, at least in part, on the computed contributions. Each output channel may correspond to at least one reproduction speaker of the reproduction environment.

Method and System for Surround Sound Processing in a Headset
20210105570 · 2021-04-08 ·

An audio headset may receive a plurality of audio signals corresponding to plurality of surround sound channels. The headset may determine, via its audio processing circuitry, context and/or content of the audio signals. The audio processing circuitry may process the audio signals to generate stereo signals carrying one or more virtual surround channels, wherein the processing comprises automatically controlling, based on the context and the content of the audio signals, a simulated acoustic environment of the virtual surround channels.

DATA PROCESSING METHOD AND APPARATUS, ACQUISITION DEVICE, AND STORAGE MEDIUM
20210112363 · 2021-04-15 ·

Disclosed is a data processing method, comprising: receiving M channels of encoded audio data; decoding the M channels of encoded audio data, to acquire space information of an audio corresponding to the M channels of audio data, wherein M is a positive integer; determining Q speaker devices corresponding to the M channels of audio data according to the acquired space information of the audio and position information of the speaker devices, wherein Q is a positive integer; and rendering the M channels of audio data with the determined Q speaker devices. Embodiments of the present invention further provide an acquisition device, a data processing device, and a storage medium.