Patent classifications
G10L21/0264
Voice enhancement in presence of noise
Communication terminal includes a first microphone system, a second microphone system, and a noise reduction processing unit (NRPU). The NRPU receives a primary signal from the first microphone system and a secondary signal from the second microphone system. The NRPU dynamically identify an optimal transfer function of a correction filter which can be applied to the secondary signal provided by the second microphone system to obtain a correction signal. The correction signal is subtracted from the primary signal to obtain a remainder signal which approximates a signal of interest contained within the primary signal.
Textual echo cancellation
A method includes receiving an overlapped audio signal that includes audio spoken by a speaker that overlaps a segment of synthesized playback audio. The method also includes encoding a sequence of characters that correspond to the synthesized playback audio into a text embedding representation. For each character in the sequence of characters, the method also includes generating a respective cancelation probability using the text embedding representation. The cancelation probability indicates a likelihood that the corresponding character is associated with the segment of the synthesized playback audio overlapped by the audio spoken by the speaker in the overlapped audio signal.
Time-based frequency tuning of analog-to-information feature extraction
A sound recognition system including time-dependent analog filtered feature extraction and sequencing. An analog front end (AFE) in the system receives input analog signals, such as signals representing an audio input to a microphone. Features in the input signal are extracted, by measuring such attributes as zero crossing events and total energy in filtered versions of the signal with different frequency characteristics at different times during the audio event. In one embodiment, a tunable analog filter is controlled to change its frequency characteristics at different times during the event. In another embodiment, multiple analog filters with different filter characteristics filter the input signal in parallel, and signal features are extracted from each filtered signal; a multiplexer selects the desired features at different times during the event.
Time-based frequency tuning of analog-to-information feature extraction
A sound recognition system including time-dependent analog filtered feature extraction and sequencing. An analog front end (AFE) in the system receives input analog signals, such as signals representing an audio input to a microphone. Features in the input signal are extracted, by measuring such attributes as zero crossing events and total energy in filtered versions of the signal with different frequency characteristics at different times during the audio event. In one embodiment, a tunable analog filter is controlled to change its frequency characteristics at different times during the event. In another embodiment, multiple analog filters with different filter characteristics filter the input signal in parallel, and signal features are extracted from each filtered signal; a multiplexer selects the desired features at different times during the event.
ACOUSTIC ZONING WITH DISTRIBUTED MICROPHONES
A method for estimating a user's location in an environment may involve receiving output signals from each microphone of a plurality of microphones in the environment. At least two microphones of the plurality of microphones may be included in separate devices at separate locations in the environment and the output signals may correspond to a current utterance of a user. The method may involve determining multiple current acoustic features from the output signals of each microphone and applying a classifier to the multiple current acoustic features. Applying the classifier may involve applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the user in a plurality of user zones in the environment. The method may involve determining, based at least in part on output from the classifier, an estimate of the user zone in which the user is currently located.
ACOUSTIC ZONING WITH DISTRIBUTED MICROPHONES
A method for estimating a user's location in an environment may involve receiving output signals from each microphone of a plurality of microphones in the environment. At least two microphones of the plurality of microphones may be included in separate devices at separate locations in the environment and the output signals may correspond to a current utterance of a user. The method may involve determining multiple current acoustic features from the output signals of each microphone and applying a classifier to the multiple current acoustic features. Applying the classifier may involve applying a model trained on previously-determined acoustic features derived from a plurality of previous utterances made by the user in a plurality of user zones in the environment. The method may involve determining, based at least in part on output from the classifier, an estimate of the user zone in which the user is currently located.
Audio cancellation for voice recognition
An audio cancellation system includes a voice enabled computing system that is connected to an audio output device using a wired or wireless communication network. The voice enabled computing device can provide media content to a user and receive a voice command from the user. The connection between the voice enabled computing system and the audio output device introduces a time delay between the media content being generated at the voice enabled computing device and the media content being reproduced at the audio output device. The system operates to determine a calibration value adapted for the voice enabled computing system and the audio output device. The system uses the calibration value to filter the user's voice command from a recording of ambient sound including the media content, without requiring significant use of memory and computing resources.
NOISE SUPPRESSION METHOD AND SYSTEM FOR PERSONAL SOUND AMPLIFICATION PRODUCT
In certain aspects, a noise suppression method and system for a personal sound amplification product (PSAP) are disclosed. An environmental audio signal acquired through one or more microphones is processed to generate a set of first sub-band signals in a set of first sub-bands. The environmental audio signal is also processed to generate a set of second sub-band signals in a set of second sub-bands. A set of first gains for the set of first sub-band signals in the set of first sub-bands is determined based on the set of second sub-band signals in the set of second sub-bands. The set of first sub-band signals is processed based on the set of first gains to generate a noise-suppressed audio signal.
NOISE SUPPRESSION METHOD AND SYSTEM FOR PERSONAL SOUND AMPLIFICATION PRODUCT
In certain aspects, a noise suppression method and system for a personal sound amplification product (PSAP) are disclosed. An environmental audio signal acquired through one or more microphones is processed to generate a set of first sub-band signals in a set of first sub-bands. The environmental audio signal is also processed to generate a set of second sub-band signals in a set of second sub-bands. A set of first gains for the set of first sub-band signals in the set of first sub-bands is determined based on the set of second sub-band signals in the set of second sub-bands. The set of first sub-band signals is processed based on the set of first gains to generate a noise-suppressed audio signal.
Conferencing Device with Beamforming and Echo Cancellation
This disclosure describes a conferencing device with beamforming and echo cancellation that includes: a microphone array that further comprises a plurality of microphones oriented to develop a corresponding plurality of microphone signals; a processor configured to execute the following steps: (1) performing a beamforming operation to combine the plurality of microphone signals from the microphone array into a plurality of combined signals, (2) performing an acoustic echo cancellation operation on the plurality of combined signals to generate a plurality of combined echo cancelled signals, (3) receiving with a voice activity detector the far end signal as an input, (4) selecting one or more of the combined echo cancelled signals for transmission to the far end where a signal selector uses the far end signal as information to inhibit the signal selector from changing the selection of the combined echo cancelled signals while only the far end signal is active.