Patent classifications
G10L21/0272
METHOD AND APPARATUS COMBINING SEPARATION AND CLASSIFICATION OF AUDIO SIGNALS
Computer-implemented methods and devices for combined audio separation and classification are provided. An estimated separated signal is time gated based on a determination of an audio classifier of, at least in part, the original mix of signals before separation. Combined separation, classification, and time gating of both the estimated signal and a residual signal are also provided.
Speech processing method, information device, and computer program product
Disclosed is a method for speech processing, an information device, and a computer program product. The method for speech processing, as implemented by a computer, includes: obtaining a mixed speech signal via a microphone, wherein the mixed speech signal includes a plurality of speech signals uttered by a plurality of unspecified speakers at the same time; generating a set of simulated speech signals according to the mixed speech signal by using a Generative Adversarial Network (GAN), in order to simulate the plurality of speech signals; determining the number of the simulated speech signals in order to estimate the number of the speakers in the surroundings and providing the number as an input of an information application.
Automatic volume control for combined game and chat audio
A system comprising audio processing circuitry is provided. The audio processing circuitry is operable to receive audio signals. The audio processing circuitry is operable to process the audio signals to detect strength of a chat component of the audio signals and strength of a game component of the audio signals. The audio processing circuitry is operable to automatically control a volume setting based on one or both of: the detected strength of the chat component, and the detected strength of the game component. The combined-game-and-chat audio signals may comprise a left channel signal and a right channel signal. The processing of the combined-game-and-chat audio signals may comprise measuring strength of a vocal-band signal component that is common to the left channel signal and the right channel signal.
Automatic volume control for combined game and chat audio
A system comprising audio processing circuitry is provided. The audio processing circuitry is operable to receive audio signals. The audio processing circuitry is operable to process the audio signals to detect strength of a chat component of the audio signals and strength of a game component of the audio signals. The audio processing circuitry is operable to automatically control a volume setting based on one or both of: the detected strength of the chat component, and the detected strength of the game component. The combined-game-and-chat audio signals may comprise a left channel signal and a right channel signal. The processing of the combined-game-and-chat audio signals may comprise measuring strength of a vocal-band signal component that is common to the left channel signal and the right channel signal.
Multi-stream target-speech detection and channel fusion
Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.
Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.
Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.
Personalized and adaptive learning audio filtering
Aspects of the invention include a method including collecting, by a processor, physiological data from a user in an environment and a sound waveform from the user's environment. The method detects and labels as a potential annoyance, by the processor, a set of potential annoyance data based on the collected physiological data and the sound waveform. The method decomposes, by the processor, the sound waveform into a first sound waveform segment associated with the set of potential annoyance data and a second sound waveform segment not associated with the set of potential annoyance data. The method predicts, by the processor, that the potential annoyance is an actual annoyance. The method filters and modifies, by the processor, the first sound waveform segment associated with the actual annoyance and provides, by the processor, the second sound waveform segment not associated with the actual annoyance to the user.
SYSTEMS AND METHODS FOR VIRTUAL MEETING SPEAKER SEPARATION
A computer-implemented machine learning method for improving speaker separation is provided. The method comprises processing audio data to generate prepared audio data and determining feature data and speaker data from the prepared audio data through a clustering iteration to generate an audio file. The method further comprises re-segmenting the audio file to generate a speaker segment and causing to display the speaker segment through a client device.
SYSTEMS AND METHODS FOR VIRTUAL MEETING SPEAKER SEPARATION
A computer-implemented machine learning method for improving speaker separation is provided. The method comprises processing audio data to generate prepared audio data and determining feature data and speaker data from the prepared audio data through a clustering iteration to generate an audio file. The method further comprises re-segmenting the audio file to generate a speaker segment and causing to display the speaker segment through a client device.