Patent classifications
G10L25/84
AUTOMATIC NOISE GATING
An audio processing system for automatically noise gating an audio signal. The audio processing system comprises a voice activity detector configured to identify one or more segments of the audio signal not representative of speech; a level detector configured to determine at least one noise level associated with the one or more segments of the audio signal identified as not representative of speech; and a noise gate configured to noise gate the audio signal using a variable noise gate threshold that is automatically set based on the at least one determined noise level.
AUTOMATIC NOISE GATING
An audio processing system for automatically noise gating an audio signal. The audio processing system comprises a voice activity detector configured to identify one or more segments of the audio signal not representative of speech; a level detector configured to determine at least one noise level associated with the one or more segments of the audio signal identified as not representative of speech; and a noise gate configured to noise gate the audio signal using a variable noise gate threshold that is automatically set based on the at least one determined noise level.
UNDERSTANDING AND RANKING RECORDED CONVERSATIONS BY CLARITY OF AUDIO
Systems and methods are provided for generating quality scores associated with a contact (e.g., a telephonic call including an agent) and with agents. In particular, the disclosed technology determines types of frames of content of the contact into a speech and/or a noise, the noise further classified into a standard noise and a non-standard noise. A frame type determiner determines a type of a frame based on a waveform analysis and/or use of speech and noise models that are trained through machine learning. The standard noise includes noise that is expected and consistent across contacts and agents (e.g., a hold music). The non-standard noise includes a noise that is unexpected in occasion and audio sources (e.g., a barking dog, a siren from street, and the like). The disclosed technology enables assessing contacts and agents based on issues associated with remote working environment that vary among agents.
Automatic volume control for combined game and chat audio
A system comprising audio processing circuitry is provided. The audio processing circuitry is operable to receive audio signals. The audio processing circuitry is operable to process the audio signals to detect strength of a chat component of the audio signals and strength of a game component of the audio signals. The audio processing circuitry is operable to automatically control a volume setting based on one or both of: the detected strength of the chat component, and the detected strength of the game component. The combined-game-and-chat audio signals may comprise a left channel signal and a right channel signal. The processing of the combined-game-and-chat audio signals may comprise measuring strength of a vocal-band signal component that is common to the left channel signal and the right channel signal.
Automatic volume control for combined game and chat audio
A system comprising audio processing circuitry is provided. The audio processing circuitry is operable to receive audio signals. The audio processing circuitry is operable to process the audio signals to detect strength of a chat component of the audio signals and strength of a game component of the audio signals. The audio processing circuitry is operable to automatically control a volume setting based on one or both of: the detected strength of the chat component, and the detected strength of the game component. The combined-game-and-chat audio signals may comprise a left channel signal and a right channel signal. The processing of the combined-game-and-chat audio signals may comprise measuring strength of a vocal-band signal component that is common to the left channel signal and the right channel signal.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.
Sound signal processing system apparatus for avoiding adverse effects on speech recognition
A sound signal processing system includes: a sound signal processing apparatus executing non-linear signal processing on a collected sound signal collected by a microphone, and transmitting, to an information processing apparatus, both a pre-execution sound signal before the non-linear signal processing is executed and a post-execution sound signal after the non-linear signal processing is executed; and the information processing apparatus receiving the pre-execution sound signal and the post-execution sound signal from the sound signal processing apparatus, and executing first processing on the pre-execution sound signal and executing second processing on the post-execution sound signal, the second processing being different from the first processing.
Multi-stream target-speech detection and channel fusion
Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.
Multi-stream target-speech detection and channel fusion
Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.
Adaptive energy limiting for transient noise suppression
The present disclosure describes aspects of adaptive energy limiting for transient noise suppression. In some aspects, an adaptive energy limiter sets a limiter ceiling for an audio signal to full scale and receives a portion of the audio signal. For the portion of the audio signal, the adaptive energy limiter determines a maximum amplitude and evaluates the portion with a neural network to provide a voice likelihood estimate. Based on the maximum amplitude and the voice likelihood estimate, the adaptive energy limiter determines that the portion of the audio signal includes noise. In response to determining that the portion of the audio signal includes noise, the adaptive energy limiter decreases the limiter ceiling and provides the limiter ceiling to a limiter module effective to limit an amount of energy of the audio signal. This may be effective to prevent audio signals from carrying full energy transient noise into conference audio.