H04S2420/13

Sound source separation apparatus and method
10650841 · 2020-05-12 · ·

The present technology relates to a sound source separation apparatus and a method which make it possible to separate a sound source at lower calculation cost. A communication unit receives a spatial frequency spectrum of a sound collection signal which is obtained by a microphone array collecting a plane wave of sound from a sound source, and a spatial frequency mask generating unit generates a spatial frequency mask for masking a component of a predetermined region in a spatial frequency domain on the basis of the spatial frequency spectrum. A sound source separating unit extracts a component of a desired sound source from the spatial frequency spectrum as an estimated sound source spectrum on the basis of the spatial frequency mask. The present technology can be applied to a spatial frequency sound source separator.

APPARATUS, SYSTEM, METHOD AND COMPUTER PROGRAM FOR DISTRIBUTING ANNOUNCEMENT MESSAGES

An apparatus comprising circuitry configured to generate one or more focused sound sources as virtual loudspeakers of an announcement system.

SYSTEM AND METHOD FOR ADAPTIVE AUDIO SIGNAL GENERATION, CODING AND RENDERING

Embodiments are described for an adaptive audio system that processes audio data comprising a number of independent monophonic audio streams. One or more of the streams has associated with it metadata that specifies whether the stream is a channel-based or object-based stream. Channel-based streams have rendering information encoded by means of channel name; and the object-based streams have location information encoded through location expressions encoded in the associated metadata. A codec packages the independent audio streams into a single serial bitstream that contains all of the audio data. This configuration allows for the sound to be rendered according to an allocentric frame of reference, in which the rendering location of a sound is based on the characteristics of the playback environment (e.g., room size, shape, etc.) to correspond to the mixer's intent. The object position metadata contains the appropriate allocentric frame of reference information required to play the sound correctly using the available speaker positions in a room that is set up to play the adaptive audio content.

RENDERING AUDIO OBJECTS HAVING APPARENT SIZE

Methods, systems, and computer program products for rending an audio object having an apparent size are disclosed. An audio processing system receives audio panning data including a first grid mapping first virtual sound sources in a space and speaker positions to speaker gains. The first grid specifies first speaker gains of the first virtual sound sources in the space. The audio processing system determines a second grid of second virtual sound sources in the space, including mapping the first virtual sound sources into the second virtual sound sources of the second virtual sources. The audio processing system selects at least one of the first grid or second grid for rendering an audio object based on an apparent size of the audio object. The audio processing system renders the audio object based on the selected grid or grids.

Methods and apparatus for compressing and decompressing a higher order ambisonics representation

Higher Order Ambisonics represents three-dimensional sound independent of a specific loudspeaker set-up. However, transmission of an HOA representation results in a very high bit rate. Therefore, compression with a fixed number of channels is used, in which directional and ambient signal components are processed differently. The ambient HOA component is represented by a minimum number of HOA coefficient sequences. The remaining channels contain either directional signals or additional coefficient sequences of the ambient HOA component, depending on what will result in optimum perceptual quality. This processing can change on a frame-by-frame basis.

COMMUNICATION APPARATUS, COMMUNICATION METHOD, PROGRAM, AND TELEPRESENCE SYSTEM
20200092639 · 2020-03-19 · ·

The present disclosure relates to a communication apparatus, a communication method, a program, and a telepresence system that can perform communication more smoothly. A sound processing unit processes, in accordance with a setting for a specific sound, the sound in performing communication using a video and a voice. For example, the sound processing unit performs processing on a sound input by an input unit, and the sound processed by the sound processing unit, and the original sound input by the input unit are both transmitted. Furthermore, the sound processing unit performs processing on a sound received by a receiving unit configured to receive a sound transmitted from another communication apparatus, and causes the sound to be output from an output unit. The present technology can be applied to a telepresence system, for example.

Audio processing apparatus and method therefor

An audio processing apparatus includes a receiver configured to receive audio data including audio components and render configuration data including audio transducer position data for a set of audio transducers. A renderer is configured to generate audio transducer signals for the set of audio transducers from the audio data, and to render audio components in accordance with rendering modes. A render controller is configured to select a rendering mode for the renderer based on the audio transducer position data. The renderer is configured to employ different rendering modes for different subsets of the set of audio transducers and the render controller is configured to independently select rendering modes for each of the different subsets of the set of audio transducer including selecting the rendering mode for a first audio transducer in response to a position of the first audio transducer relative to a predetermined position for the first audio transducer.

Modeling room acoustics using acoustic waves

Techniques for simulating a microphone array and generating synthetic audio data to analyze the microphone array geometry. This reduces the development cost of new microphone arrays by enabling an evaluation of performance metrics (False Rejection Rate (FRR), Word Error Rate (WER), etc.) without building device hardware or collecting data. To generate the synthetic audio data, the system performs acoustic modeling to determine a room impulse response associated with a prototype device (e.g., potential microphone array) in a room. The acoustic modeling is based on two parametersa device response (information about acoustics and geometry of the prototype device) and a room response (information about acoustics and geometry of the room). The device response can be simulated based on the microphone array geometry, and the room response can be determined using a specialized microphone and a plane wave decomposition algorithm.

Apparatus and method for providing a loudspeaker-enclosure-microphone system description

An apparatus for providing a current loudspeaker-enclosure-microphone system description of a loudspeaker-enclosure-microphone system is provided. The apparatus has a first transformation unit for generating a plurality of wave-domain loudspeaker audio signals. Moreover, the apparatus has a second transformation unit for generating a plurality of wave-domain microphone audio signals. Furthermore, the apparatus has a system description generator for generating the current loudspeaker-enclosure-microphone system description based on the plurality of wave-domain loudspeaker audio signals, based on the plurality of wave-domain microphone audio signals, and based on a plurality of coupling values, wherein the system description generator is configured to determine each coupling value assigned to a wave-domain pair of a plurality of wave-domain pairs by determining a relation indicator indicating a relation between a loudspeaker-signal-transformation value and a microphone-signal-transformation value.

APPARATUS, METHOD AND COMPUTER PROGRAM FOR ENCODING, DECODING, SCENE PROCESSING AND OTHER PROCEDURES RELATED TO DIRAC BASED SPATIAL AUDIO CODING USING LOW-ORDER, MID-ORDER AND HIGH-ORDER COMPONENTS GENERATORS

An apparatus for generating a sound field description using an input signal having a mono-signal or a multi-channel signal includes: an input signal analyzer for analyzing the input signal to derive direction data and diffuseness data; a low-order components generator for generating a low-order sound field description from the input signal up to a predetermined order and mode; a mid-order components generator for generating a mid-order sound field description above the predetermined order or at the predetermined order and above the predetermined mode and below or at a first truncation order using a synthesis of at least one direct portion and of at least one diffuse portion using the direction data and the diffuseness data; and a high-order components generator for generating a high-order sound field description having a component above the first truncation order using a synthesis of at least one direct portion.