Patent classifications
G10L21/0308
Audio Source Separation Processing Workflow Systems and Methods
Systems and methods includes receiving a single-track audio input stream having a mixture of audio signals generated from a plurality of sources, training an audio source separation model using, at least in part, the received single-track audio input stream, and separating audio sources, using the audio source separation model, from the audio input stream in accordance with one or more processing recipes to generate a plurality of source separated output stems. The audio separation model is trained to receive the single-track audio input stream and generate a plurality of audio stems corresponding to one or more audio sources of the plurality of sources.
Audio Source Separation Processing Workflow Systems and Methods
Systems and methods includes receiving a single-track audio input stream having a mixture of audio signals generated from a plurality of sources, training an audio source separation model using, at least in part, the received single-track audio input stream, and separating audio sources, using the audio source separation model, from the audio input stream in accordance with one or more processing recipes to generate a plurality of source separated output stems. The audio separation model is trained to receive the single-track audio input stream and generate a plurality of audio stems corresponding to one or more audio sources of the plurality of sources.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND SIGNAL PROCESSING PROGRAM
A signal processing apparatus includes a neural network (“NN”), a sorting unit, and a spatial covariance matrix calculation unit. The NN converts a mixed signal, in which sounds of a plurality of sound sources input by a plurality of channels are mixed, into a separated signal separated into a signal for each sound source as a signal in a time domain as it is and outputs the separated signal. The sorting unit sorts, for the separated signal of each channel output from the NN, the separated signal of each channel such that the plurality of sound sources of a plurality of the separated signals are aligned among the plurality of channels. The spatial covariance matrix calculation unit calculates a spatial covariance matrix corresponding to each sound source in accordance with the separated signal for each channel output from the sorting unit and sorted.
Methods, apparatuses and computer programs relating to spatial audio
An apparatus is disclosed, configured to receive, from first and second spatial audio capture apparatuses, respective first and second composite audio signals comprising components derived from one or more sound sources in a capture space. The apparatus is further configured to identify a position of a user device corresponding to one of first and second areas respectively associated with the positions of the first and second spatial audio capture apparatuses, and to render audio representing the one or more sound sources to the user device, the rendering being performed differently dependent on, for the spatial audio capture apparatus associated with the identified first or second area, whether or not individual audio signals from each of the one or more sound sources can be successfully separated from its composite signal.
Methods, apparatuses and computer programs relating to spatial audio
An apparatus is disclosed, configured to receive, from first and second spatial audio capture apparatuses, respective first and second composite audio signals comprising components derived from one or more sound sources in a capture space. The apparatus is further configured to identify a position of a user device corresponding to one of first and second areas respectively associated with the positions of the first and second spatial audio capture apparatuses, and to render audio representing the one or more sound sources to the user device, the rendering being performed differently dependent on, for the spatial audio capture apparatus associated with the identified first or second area, whether or not individual audio signals from each of the one or more sound sources can be successfully separated from its composite signal.
Method, apparatus and computer program for detecting voice uttered from a particular position
An information processing apparatus includes a voice acquisition section, a reliability generation section, and a processing execution section. The voice acquisition section acquires an ambient voice. The reliability generation section generates reliability indicating a degree in which the acquired voice is uttered from the particular position on the basis of a predetermined transfer characteristic. As the predetermined transfer characteristic, a phase difference or acoustic characteristic of the voice can be assumed. The processing execution section executes a process according to the generated reliability. As the process according to the reliability, a notification according to the reliability or a predetermined command can be assumed to be executed.
Method and arrangement for controlling smoothing of stationary background noise
In a method for coding of information for enhancing a background noise representation, voice activity of an input speech signal is determined. A noisiness parameter is determined for an inactive speech signal, wherein the noisiness parameter is based on a ratio of prediction gains of two Linear Predictive Coder (LPC) prediction filters with different orders. The noisiness parameter is quantized, and the quantized noisiness parameter is encoded for transmission.
Method and arrangement for controlling smoothing of stationary background noise
In a method for coding of information for enhancing a background noise representation, voice activity of an input speech signal is determined. A noisiness parameter is determined for an inactive speech signal, wherein the noisiness parameter is based on a ratio of prediction gains of two Linear Predictive Coder (LPC) prediction filters with different orders. The noisiness parameter is quantized, and the quantized noisiness parameter is encoded for transmission.
Analyzing changes in vocal power within music content using frequency spectrums
Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive content. The vocal content can then be processed to identify surge points in the digitized audio. In some implementations, the vocal content is included in the harmonic content during the separation procedure and is then separated from the harmonic content.
Analyzing changes in vocal power within music content using frequency spectrums
Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive content. The vocal content can then be processed to identify surge points in the digitized audio. In some implementations, the vocal content is included in the harmonic content during the separation procedure and is then separated from the harmonic content.