H04S2420/13

SOUND SOURCE SEPARATION APPARATUS AND METHOD, AND PROGRAM
20180047407 · 2018-02-15 ·

The present technology relates to a sound source separation apparatus, a method, and a program which make it possible to separate a sound source at lower calculation cost. A communication unit receives a spatial frequency spectrum of a sound collection signal which is obtained by a microphone array collecting a plane wave of sound from a sound source, and a spatial frequency mask generating unit generates a spatial frequency mask for masking a component of a predetermined region in a spatial frequency domain on the basis of the spatial frequency spectrum. A sound source separating unit extracts a component of a desired sound source from the spatial frequency spectrum as an estimated sound source spectrum on the basis of the spatial frequency mask. The present technology can be applied to a spatial frequency sound source separator.

Three-dimensional audio rendering techniques

Three-dimensional (3D) audio content creation and rendering systems and methodologies are presented here. A disclosed method of processing 3D audio assigns audio source objects to 3D video objects, links audio tracks to assigned audio source objects, and performs wave field synthesis on the linked audio tracks to generate 3D audio data representing a 3D spatial sound field. A disclosed method of processing 3D audio during playback of 3D video content obtains 3D audio data and 3D video data for a frame of 3D video content, applies device-specific parameters to the 3D audio data to obtain transformed 3D audio data scaled to a presentation device, and processes the transformed 3D audio data to render audio information for an array of speakers associated with the presentation device.

APPARATUS AND METHOD FOR DRIVING AN ARRAY OF LOUDSPEAKERS

A local wave field synthesis apparatus, which includes a determination module for determining desired sound pressures and desired particle velocity vectors at a plurality of control points, a computation module for computing sound pressures and particle velocity vectors at the plurality of control points based on a set of filter parameters, an optimization module for computing an optimum set of filter parameters by jointly optimizing computed sound pressures towards the desired sound pressures and computed particle velocity vectors towards the desired particle velocity vectors, and a generator module for generating the drive signals based on the optimum set of filter parameters, wherein the plurality of control points are located on one or more contours around the one or more audio zones.

System and Method for Adaptive Audio Signal Generation, Coding and Rendering

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

Apparatus and method for encoding a spatial audio representation or apparatus and method for decoding an encoded audio signal using transport metadata and related computer programs

An apparatus for encoding a spatial audio representation representing an audio scene to obtain an encoded audio signal includes: a transport representation generator for generating a transport representation from the spatial audio representation, and for generating transport metadata related to the generation of the transport representation or indicating one or more directional properties of the transport representation; and an output interface for generating the encoded audio signal, the encoded audio signal including information on the transport representation, and information on the transport metadata.

Audio representation and associated rendering

An apparatus for immersive audio communication including circuitry configured to: receive at least a first audio data stream and a second audio data stream, wherein at least one of the first and second audio stream includes a spatial audio stream to enable immersive audio during a communication; determine a type of each of the first and second audio streams to identify which of the received first and second audio data streams the spatial audio stream; process the second audio data stream with at least one parameter dependent on the determined type; and render the first audio data stream and the processed second audio data stream.

APPARATUS, METHOD AND COMPUTER PROGRAM FOR ENCODING, DECODING, SCENE PROCESSING AND OTHER PROCEDURES RELATED TO DIRAC BASED SPATIAL AUDIO CODING USING LOW-ORDER, MID-ORDER AND HIGH-ORDER COMPONENTS GENERATORS

An apparatus for generating a sound field description using an input signal having a mono-signal or a multi-channel signal includes: an input signal analyzer for analyzing the input signal to derive direction data and diffuseness data; a low-order components generator for generating a low-order sound field description from the input signal up to a predetermined order and mode; a mid-order components generator for generating a mid-order sound field description above the predetermined order or at the predetermined order and above the predetermined mode and below or at a high order, wherein the mid-order sound field description comprises a direct contribution and a diffuse contribution; and a high-order components generator for generating a high-order sound field description comprising a sound field component above the high order using a synthesis of at least one direct portion, wherein the high-order sound field description comprises a direct contribution only.

METHODS AND APPARATUS FOR COMPRESSING AND DECOMPRESSING A HIGHER ORDER AMBISONICS REPRESENTATION

Higher Order Ambisonics represents three-dimensional sound independent of a specific loudspeaker set-up. However, transmission of an HOA representation results in a very high bit rate. Therefore compression with a fixed number of channels is used, in which directional and ambient signal components are processed differently. The ambient HOA component is represented by a minimum number of HOA coefficient sequences. The remaining channels contain either directional signals or additional coefficient sequences of the ambient HOA component, depending on what will result in optimum perceptual quality. This processing can change on a frame-by-frame basis.

System and method for adaptive audio signal generation, coding and rendering

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

System and Method for Adaptive Audio Signal Generation, Coding and Rendering

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.