H04S2420/03

Transmission apparatus, transmission method, processing apparatus, and processing method
11341976 · 2022-05-24 · ·

A voice output corresponding to a fixed position of a wide viewing angle image is easily obtained. A transmission unit configured to transmit spatial voice data and information regarding a predetermined number of registered viewpoints is included. For example, the spatial voice data is data of scene-based audio. Then, for example, the data of the scene-based audio is each component of an HoA format. For example, the information regarding a viewpoint includes information regarding an azimuth angle (azimuth information) and an elevation angle (elevation angle information) that indicate a position of this viewpoint. For example, the data of the scene-based audio and the information regarding the predetermined number of registered viewpoints are transmitted with being included in a packet of object audio.

Encoding and decoding methods, and encoding and decoding apparatuses for stereo signal

This disclosure provides a decoding method, and a decoding apparatus for a stereo signal. The decoding method includes: decoding a bitstream to obtain a first channel signal, a second channel signal, and a first ITD of a current frame of a stereo signal; performing a mixing processing on the first channel signal and the second channel signal, to obtain a third channel reconstructed signal and a fourth channel reconstructed signal; performing interpolation processing based on the first ITD and a second ITD of a previous frame previous to the current frame, to obtain a third ITD; and adjusting a delay of the third channel reconstructed signal and the fourth channel reconstructed signal based on the third ITD.

METHODS AND SYSTEMS FOR RENDERING AUDIO BASED ON PRIORITY

Embodiments are directed to a method of rendering adaptive audio by receiving input audio comprising channel-based audio, audio objects, and dynamic objects, wherein the dynamic objects are classified as sets of low-priority dynamic objects and high-priority dynamic objects, rendering the channel-based audio, the audio objects, and the low-priority dynamic objects in a first rendering processor of an audio processing system, and rendering the high-priority dynamic objects in a second rendering processor of the audio processing system. The rendered audio is then subject to virtualization and post-processing steps for playback through soundbars and other similar limited height capable speakers.

RECONSTRUCTION OF AUDIO SCENES FROM A DOWNMIX

Audio objects are associated with positional metadata. A received downmix signal comprises downmix channels that are linear combinations of one or more audio objects and are associated with respective positional locators.

In a first aspect, the downmix signal, the positional metadata and frequency-dependent object gains are received. An audio object is reconstructed by applying the object gain to an upmix of the downmix signal in accordance with coefficients based on the positional metadata and the positional locators.

In a second aspect, audio objects have been encoded together with at least one bed channel positioned at a positional locator of a corresponding downmix channel. The decoding system receives the downmix signal and the positional metadata of the audio objects. A bed channel is reconstructed by suppressing the content representing audio objects from the corresponding downmix channel on the basis of the positional locator of the corresponding downmix channel.

ELECTRONIC DEVICE, SYSTEM, METHOD AND COMPUTER PROGRAM

An electronic device comprising circuitry configured to receive an audio mixture signal and side information related to sources present in the audio mixture signal, perform audio source separation on the audio mixture to obtain separated sources, and generate respective virtual audio objects based on the separated sources and the side information.

Method, apparatus or systems for processing audio objects

Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.

GENERATING AUDIO OUTPUT SIGNALS

An apparatus, method and computer program is described comprising capturing spatial audio data during an image capturing process, determining an orientation of an image capturing device during the spatial audio data capture, generating an audio focus signal from said captured spatial audio data (wherein said audio focus signal is focused in an image capturing direction of said image capturing device), generating modified spatial audio data (e.g. by modifying the captured spatial audio data to compensate for changes in orientation during the spatial audio data capture), and generating an audio output signal from a combination of the audio focus signal and the modified spatial audio data.

Apparatus and method for realizing a SAOC downmix of 3D audio content

An apparatus for generating one or more audio output channels is provided. The apparatus includes a parameter processor for calculating output channel mixing information and a downmix processor for generating the one or more audio output channels. The downmix processor is configured to receive an audio transport signal including one or more audio transport channels, wherein two or more audio object signals are mixed within the audio transport signal, and wherein the number of the one or more audio transport channels is smaller than the number of the two or more audio object signals. The audio transport signal depends on a first mixing rule and on a second mixing rule. The first mixing rule indicates how to mix the two or more audio object signals to obtain a plurality of premixed channels. Moreover, the second mixing rule indicates how to mix the plurality of premixed channels.

Reverberation technique for 3D audio objects

Reverberation techniques for 3D audio are disclosed. In an example method, a three-dimensional (3D) reverberation is applied to a sound object placed at a sound object position in a sound room. The sound object originates from a sound object position. A sound object signal is received. A 3D spatial room response (SRR) signal is computed corresponding to the user-selected position. A time convolution operation is performed between an audio signal of the sound object signal and the computed SRR value to generate a reverberated signal.

METHODS, APPARATUS AND SYSTEMS FOR A PRE-RENDERED SIGNAL FOR AUDIO RENDERING

The present disclosure relates to a method of decoding audio scene content from a bitstream by a decoder that includes an audio renderer with one or more rendering tools. The method comprises receiving the bitstream, decoding a description of an audio scene from the bitstream, determining one or more effective audio elements from the description of the audio scene, determining effective audio element information indicative of effective audio element positions of the one or more effective audio elements from the description of the audio scene, decoding a rendering mode indication from the bitstream, wherein the rendering mode indication is indicative of whether the one or more effective audio elements represent a sound field obtained from pre-rendered audio elements and should be rendered using a predetermined rendering mode, and in response to the rendering mode indication indicating that the one or more effective audio elements represent the sound field obtained from pre-rendered audio elements and should be rendered using the predetermined rendering mode, rendering the one or more effective audio elements using the predetermined rendering mode, wherein rendering the one or more effective audio elements using the predetermined rendering mode takes into account the effective audio element information, and wherein the predetermined rendering mode defines a predetermined configuration of the rendering tools for controlling an impact of an acoustic environment of the audio scene on the rendering output. The disclosure further relates to a method of generating audio scene content and a method of encoding audio scene content into a bitstream.