Patent classifications
G10L21/028
Audio Processing Method, Method for Training Estimation Model, and Audio Processing System
An audio processing method by which input data are obtained that includes first sound data representing first components of a first frequency band, included in a first sound corresponding to a first sound source, second sound data representing second components of the first frequency band, included in a second sound corresponding to a second sound source, and mix sound data representing mix components of an input frequency band including a second frequency band, the mix components being included in a mix sound of the first sound and the second sound. The input data are then input to a trained estimation model, to generate at least one of first output data representing first estimated components within an output frequency band including the second frequency band, included in the first sound, or second output data representing second estimated components within the output frequency band, included in the second sound.
AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND STORAGE MEDIUM
An audio signal processing device comprises: a determination unit that determines a first voice segment for a target speaker linked to a host device on the basis of an externally acquired first audio signal; a sharing unit that transmits the first audio signal and the first voice segment to another device linked to a non-target speaker and receives a second audio signal and a second voice segment associated with the non-target speaker from the other device; an estimation unit that estimates the voice of the non-target speaker mixed in the first audio signal on the basis of the second audio signal and the second voice segment that are received and an estimation parameter associated with the target speaker that is acquired; and a removal unit that removes the voice of the non-target speaker from the first audio signal.
AUDIO SIGNAL PROCESSING DEVICE, AUDIO SIGNAL PROCESSING METHOD, AND STORAGE MEDIUM
An audio signal processing device comprises: a determination unit that determines a first voice segment for a target speaker linked to a host device on the basis of an externally acquired first audio signal; a sharing unit that transmits the first audio signal and the first voice segment to another device linked to a non-target speaker and receives a second audio signal and a second voice segment associated with the non-target speaker from the other device; an estimation unit that estimates the voice of the non-target speaker mixed in the first audio signal on the basis of the second audio signal and the second voice segment that are received and an estimation parameter associated with the target speaker that is acquired; and a removal unit that removes the voice of the non-target speaker from the first audio signal.
SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS
An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.
SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS
An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.
STEREOPHONIC AUDIO REARRANGEMENT BASED ON DECOMPOSED TRACKS
The present invention provides a method for processing audio data, comprising providing input audio data containing a mixture of different timbres, decomposing the input audio data to generate decomposed data representing a predetermined timbre selected from the timbres contained in the input audio data, determining a set point position of a virtual sound source outputting the predetermined timbre relative to a position of a virtual listener, and generating stereophonic output data based on the decomposed data and the determined set point position.
STEREOPHONIC AUDIO REARRANGEMENT BASED ON DECOMPOSED TRACKS
The present invention provides a method for processing audio data, comprising providing input audio data containing a mixture of different timbres, decomposing the input audio data to generate decomposed data representing a predetermined timbre selected from the timbres contained in the input audio data, determining a set point position of a virtual sound source outputting the predetermined timbre relative to a position of a virtual listener, and generating stereophonic output data based on the decomposed data and the determined set point position.
Systems and methods for preparing reference signals for an acoustic echo canceler
A method for preparing reference signals for an echo cancellation system disposed in a vehicle, comprising the steps of: receiving a plurality of drive signals, each drive signal being provided to an associated transducer of a plurality of acoustic transducers such that the associated acoustic transducer transduces the drive signal into an acoustic signal, filtering each drive signal with a respective filter of a plurality of filters to produce a plurality of filtered signals, wherein each of the plurality of filters approximates a transfer function from an associated acoustic transducer to a microphone disposed within the vehicle such that the plurality of filtered signals each estimate a respective acoustic signal at the microphone; summing together at least a subset of the plurality of filtered signals to produce a summed reference signal; and outputting the summed reference signal to an echo cancellation system.
Systems and methods for preparing reference signals for an acoustic echo canceler
A method for preparing reference signals for an echo cancellation system disposed in a vehicle, comprising the steps of: receiving a plurality of drive signals, each drive signal being provided to an associated transducer of a plurality of acoustic transducers such that the associated acoustic transducer transduces the drive signal into an acoustic signal, filtering each drive signal with a respective filter of a plurality of filters to produce a plurality of filtered signals, wherein each of the plurality of filters approximates a transfer function from an associated acoustic transducer to a microphone disposed within the vehicle such that the plurality of filtered signals each estimate a respective acoustic signal at the microphone; summing together at least a subset of the plurality of filtered signals to produce a summed reference signal; and outputting the summed reference signal to an echo cancellation system.
SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND PROGRAM
A signal processing apparatus is provided that includes a sound source separation section configured to apply sound source separation processing to a mixed sound signal including a mixture of signals of a plurality of sound sources, and band extension sections configured to apply frequency band extension processing to respective sound source separation signals obtained by separation by the sound source separation section.