Patent classifications
G10L21/0216
ARTIFICIAL INTELLIGENCE-BASED AUDIO PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
An artificial intelligence-based audio processing method includes: obtaining an audio clip of an audio scene, the audio clip including noise; performing audio scene classification processing based on the audio clip to obtain an audio scene type corresponding to the noise in the audio clip; and determining a target audio processing mode corresponding to the audio scene type, and applying the target audio processing mode to the audio clip of the audio scene according to a degree of interference caused by the noise in the audio clip.
Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition
A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.
Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition
A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.
AUDIBLE HOWLING CONTROL SYSTEMS AND METHODS
An audio system includes: a speaker; a microphone that generates a microphone signal based on sound output from the speaker; a mixer module configured to generate a mixed signal by mixing the microphone signal with an audio signal; a filter module configured to filter the mixed signal to produce a filtered signal and to apply the filtered signal to the speaker; and a detector module configured to determine a howling frequency in the microphone signal attributable to sound output from the speaker, where the filter module is configured to decrease a magnitude of the filtered signal at the howling frequency.
Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
Array microphone systems and methods that can automatically focus and/or place beamformed lobes in response to detected sound activity are provided. The automatic focus and/or placement of the beamformed lobes can be inhibited based on a remote far end audio signal. The quality of the coverage of audio sources in an environment may be improved by ensuring that beamformed lobes are optimally picking up the audio sources even if they have moved and changed locations.
Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
Array microphone systems and methods that can automatically focus and/or place beamformed lobes in response to detected sound activity are provided. The automatic focus and/or placement of the beamformed lobes can be inhibited based on a remote far end audio signal. The quality of the coverage of audio sources in an environment may be improved by ensuring that beamformed lobes are optimally picking up the audio sources even if they have moved and changed locations.
Method and apparatus for estimating variability of background noise for noise suppression
An electronic device measures noise variability of background noise present in a sampled audio signal, and determines whether the measured noise variability is higher than a high threshold value or lower than a low threshold value. If the noise variability is determined to be higher than the high threshold value, the device categorizes the background noise as having a high degree of variability. If the noise variability is determined to be lower than the low threshold value, the device categorizes the background noise as having a low degree of variability. The high and low threshold values are between a high boundary point and a low boundary point. The high boundary point is based on an analysis of files including noises that exhibit a high degree of variability, and the low boundary point is based on an analysis of files including noises that exhibit a low degree of variability.
Method and apparatus for estimating variability of background noise for noise suppression
An electronic device measures noise variability of background noise present in a sampled audio signal, and determines whether the measured noise variability is higher than a high threshold value or lower than a low threshold value. If the noise variability is determined to be higher than the high threshold value, the device categorizes the background noise as having a high degree of variability. If the noise variability is determined to be lower than the low threshold value, the device categorizes the background noise as having a low degree of variability. The high and low threshold values are between a high boundary point and a low boundary point. The high boundary point is based on an analysis of files including noises that exhibit a high degree of variability, and the low boundary point is based on an analysis of files including noises that exhibit a low degree of variability.
User voice control system
Embodiments include techniques and objects related to a wearable audio device that includes a microphone to detect a plurality of sounds in an environment in which the wearable audio device is located. The wearable audio device further includes a non-acoustic sensor to detect that a user of the wearable audio device is speaking. The wearable audio device further includes one or more processors communicatively to alter, based on an identification by the non-acoustic sensor that the user of the wearable audio device is speaking, one or more of the plurality of sounds to generate a sound output. Other embodiments may be described or claimed.
User voice control system
Embodiments include techniques and objects related to a wearable audio device that includes a microphone to detect a plurality of sounds in an environment in which the wearable audio device is located. The wearable audio device further includes a non-acoustic sensor to detect that a user of the wearable audio device is speaking. The wearable audio device further includes one or more processors communicatively to alter, based on an identification by the non-acoustic sensor that the user of the wearable audio device is speaking, one or more of the plurality of sounds to generate a sound output. Other embodiments may be described or claimed.