Patent classifications
G10L21/0216
METHODS FOR SYNTHESIS-BASED CLEAR HEARING UNDER NOISY CONDITIONS
This invention provides a new and improved hearing aid system with high quality noise cancellation method and devices to overcome the limitations and difficulties encountered in conventional technologies. The technical limitations of the noise uncertainty and speech distortion in the hearing aid field are resolved by restoration of the high-quality speech by converting the speech content into an intermediate linguistic representation and by synthesizing the speech of the same speaker with pre-trained using artificial intelligence (AI) modules. In this invention, the noise uncertainties are circumvented by focusing on the target speaker or picking up the dominant speech by choosing the corresponding setting assuming the speech from the target speaker is the dominant speech based on the Lombard effect.
METHODS FOR SYNTHESIS-BASED CLEAR HEARING UNDER NOISY CONDITIONS
This invention provides a new and improved hearing aid system with high quality noise cancellation method and devices to overcome the limitations and difficulties encountered in conventional technologies. The technical limitations of the noise uncertainty and speech distortion in the hearing aid field are resolved by restoration of the high-quality speech by converting the speech content into an intermediate linguistic representation and by synthesizing the speech of the same speaker with pre-trained using artificial intelligence (AI) modules. In this invention, the noise uncertainties are circumvented by focusing on the target speaker or picking up the dominant speech by choosing the corresponding setting assuming the speech from the target speaker is the dominant speech based on the Lombard effect.
MULTIMODAL INTENT UNDERSTANDING FOR AUTOMATED ASSISTANT
Implementations described herein include detecting a stream of audio data that captures a spoken utterance of the user and that captures ambient noise occurring within a threshold time period of the spoken utterance being spoken by the user. Implementations further include processing a portion of the audio data that includes the ambient noise to determine ambient noise classification(s), processing a portion of the audio data that includes the spoken utterance to generate a transcription, processing both the transcription and the ambient noise classification(s) with a machine learning model to generate a user intent and parameter(s) for the user intent, and performing one or more automated assistant actions based on the user intent and using the parameter(s).
Microphone with adjustable signal processing
A microphone may comprise a microphone element for detecting sound, and a digital signal processor configured to process a first audio signal that is based on the sound in accordance with a selected one of a plurality of digital signal processing (DSP) modes. Each of the DSP modes may be for processing the first audio signal in a different way. For example, the DSP modes may account for distance of the person speaking (e.g., near versus far) and/or desired tone (e.g., darker, neutral, or bright tone). At least some of the modes may have, for example, an automatic level control setting to provide a more consistent volume as the user changes their distance from the microphone or changes their speaking level, and that may be associated with particular default (and/or adjustable) values of the parameters attack, hold, decay, maximum gain, and/or target gain, each depending upon which DSP is being applied.
Microphone with adjustable signal processing
A microphone may comprise a microphone element for detecting sound, and a digital signal processor configured to process a first audio signal that is based on the sound in accordance with a selected one of a plurality of digital signal processing (DSP) modes. Each of the DSP modes may be for processing the first audio signal in a different way. For example, the DSP modes may account for distance of the person speaking (e.g., near versus far) and/or desired tone (e.g., darker, neutral, or bright tone). At least some of the modes may have, for example, an automatic level control setting to provide a more consistent volume as the user changes their distance from the microphone or changes their speaking level, and that may be associated with particular default (and/or adjustable) values of the parameters attack, hold, decay, maximum gain, and/or target gain, each depending upon which DSP is being applied.
Adaptive energy limiting for transient noise suppression
The present disclosure describes aspects of adaptive energy limiting for transient noise suppression. In some aspects, an adaptive energy limiter sets a limiter ceiling for an audio signal to full scale and receives a portion of the audio signal. For the portion of the audio signal, the adaptive energy limiter determines a maximum amplitude and evaluates the portion with a neural network to provide a voice likelihood estimate. Based on the maximum amplitude and the voice likelihood estimate, the adaptive energy limiter determines that the portion of the audio signal includes noise. In response to determining that the portion of the audio signal includes noise, the adaptive energy limiter decreases the limiter ceiling and provides the limiter ceiling to a limiter module effective to limit an amount of energy of the audio signal. This may be effective to prevent audio signals from carrying full energy transient noise into conference audio.
Adaptive energy limiting for transient noise suppression
The present disclosure describes aspects of adaptive energy limiting for transient noise suppression. In some aspects, an adaptive energy limiter sets a limiter ceiling for an audio signal to full scale and receives a portion of the audio signal. For the portion of the audio signal, the adaptive energy limiter determines a maximum amplitude and evaluates the portion with a neural network to provide a voice likelihood estimate. Based on the maximum amplitude and the voice likelihood estimate, the adaptive energy limiter determines that the portion of the audio signal includes noise. In response to determining that the portion of the audio signal includes noise, the adaptive energy limiter decreases the limiter ceiling and provides the limiter ceiling to a limiter module effective to limit an amount of energy of the audio signal. This may be effective to prevent audio signals from carrying full energy transient noise into conference audio.
Apparatus and method for automatic volume control with ambient noise compensation
An electronic device and method that automatically adjusts an audio output volume level based on a live environmental acoustic scenario input via a microphone using a machine learning algorithm trained with Human Activity Recognition (HAR). Equipped with such an intelligence the electronic device classifies ambient sounds occurring in the environment of the listening area in which the device is situated into different acoustic scenario mappings such a voice or conversation, for an ambient human conversation detected event, and noise, such as for example a vacuum cleaner or dish washer noise detected event, and automatically adjust the audio output volume accordingly.
Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.
Online target-speech extraction method based on auxiliary function for robust automatic speech recognition
A target speech signal extraction method for robust speech recognition includes: initializing a steering vector for a target speech source and an adaptive vector, setting a real output channel of the target speech source as an output by the adaptive vector, initializing adaptive vectors for a noise and setting a dummy channel as an output by the adaptive vectors for the noise; setting a cost function for minimizing dependency between a real output for the target speech source and a dummy output for the noise; setting an auxiliary function to the cost function, and updating the adaptive vector for the target speech source and the adaptive vectors for the noise by using the auxiliary function and the steering vector; estimating the target speech signal by using the adaptive vector thereby extracting the target speech signal from the input signals; and updating the steering vector for the target speech source.