Patent classifications
G10L2021/02161
Noise Mitigation for a Voice Interface Device
A method at an electronic device with one or more microphones and a speaker, the electronic device configured to be responsive to any of a plurality of affordances including a voice-based affordance, includes determining background noise of an environment associated with the electronic device, and before detecting the voice-based affordance: determining whether the background noise would interfere with recognition of the hotword in voice inputs detected by the electronic device, and if so, indicating to a user to use an affordance other than the voice-based affordance.
AUDIO SIGNAL PROCESSING METHOD AND DEVICE, AND STORAGE MEDIUM
An audio signal processing method includes: acquiring audio signals from at least two sound sources respectively through at least two microphones (MICs) to obtain respective original noisy signals of the at least two MICs in a time domain; for each frame in the time domain, using a first asymmetric window to perform a windowing operation on the respective original noisy signals of the at least two MICs to acquire windowed noisy signals; performing time-frequency conversion on the windowed noisy signals to acquire respective frequency-domain noisy signals of the at least two sound sources; acquiring frequency-domain estimated signals of the at least two sound sources according to the frequency-domain noisy signals; and obtaining audio signals produced respectively by the at least two sound sources according to the frequency-domain estimated signals.
Outputting notifications using device groups
A system that determines that devices are co-located in an acoustic region and selects a single device to which to send incoming notifications for the acoustic region. The system may group devices into separate acoustic regions based on selection data that selects between similar audio data received from multiple devices. The system may select the best device for each acoustic region based on a frequency that the device was selected previously, input/output capabilities of the device, a proximity to a user, or the like. The system may send a notification to a single device in each of the acoustic regions so that a user receives a single notification instead of multiple unsynchronized notifications. The system may also determine that acoustic regions are associated with different locations and select acoustic regions to which to send a notification based on location.
APPARATUS AND METHOD FOR POWER EFFICIENT SIGNAL CONDITIONING FOR A VOICE RECOGNITION SYSTEM
A disclosed method includes monitoring an audio signal energy level while having a noise suppressor deactivated to conserve battery power, buffering the audio signal in response to a detected increase in the audio energy level, activating and running a voice activity detector on the audio signal in response to the detected increase in the audio energy level and activating and running a noise estimator in response to voice being detected in the audio signal by the voice activity detector. The method may further include activating and running the noise suppressor only if the noise estimator determines that noise suppression is required. The method activates and runs a noise type classifier to determine the noise type based on information received from the noise estimator and selects a noise suppressor algorithm, from a group of available noise suppressor algorithms, where the selected noise suppressor algorithm is the most power consumption efficient.
Linear Filtering for Noise-Suppressed Speech Detection
Systems and methods for suppressing noise and detecting voice input in a multi-channel audio signal captured by a plurality of microphones include (i) capturing a first audio signal via a first microphone and a second audio signal via a second microphone, wherein the first and second audio signals respectively comprises first and second noise content from a noise source; (ii) identifying the first noise content in the first audio signal; (iii) using the identified first noise content to determine an estimated noise content captured by the plurality of microphones; (iv) using the estimated noise content to suppress the first and second noise content in the first and second audio signals; (v) combining the suppressed first and second audio signals into a third audio signal; and (vi) determining that the third audio signal includes a voice input comprising a wake word.
Systems and Methods for Generating a Cleaned Version of Ambient Sound
While a media content item is emitted by a second electronic device that is remote from the first electronic device, the first electronic device receives data that includes: timing information, offset information that indicates a difference between an initial position of the media content item and a current playback position of the media content item, and an audio stream that corresponds to the media content item. The first electronic device detects ambient sound that includes sound corresponding to the media content item emitted by the second electronic device. The first electronic device generates a cleaned version of the ambient sound by using the timing information and the offset information to align the audio stream with the ambient sound and performing a subtraction operation to substantially subtract the audio stream from the ambient sound.
SPECTRAL BLENDING WITH INTERIOR MICROPHONE
A headphone can include plurality of exterior microphones, that generates corresponding exterior microphone signals, an accelerometer that generates an accelerometer signal; and an interior microphone, not directly exposed to the environment, that generates an interior microphone signal. A processor of the headphone can be configured to generate an audio signal containing voice of a user, based on a) the accelerometer signal, b) the interior microphone signal, and c) the plurality of exterior microphone signals.
Apparatus and method for power efficient signal conditioning for a voice recognition system
A disclosed method includes monitoring an audio signal energy level while having a plurality of signal processing components deactivated and activating at least one signal processing component in response to a detected change in the audio signal energy level. The method may include activating and running a voice activity detector on the audio signal in response to the detected change where the voice activity detector is the at least one signal processing component. The method may further include activating and running the noise suppressor only if a noise estimator determines that noise suppression is required. The method may activate and runs a noise type classifier to determine the noise type based on information received from the noise estimator and may select a noise suppressor algorithm, from a group of available noise suppressor algorithms, where the selected noise suppressor algorithm is the most power consumption efficient.
Apparatus and method for processing an audio signal using noise suppression filter values
An apparatus for processing an audio signal includes an audio signal analyzer and a filter. The audio signal analyzer is configured to analyze an audio signal to determine a plurality of noise suppression filter values for a plurality of bands of the audio signal, wherein the analyzer is configured to determine a noise suppression filter value so that a noise suppression filter value is greater than or equal to a minimum noise suppression filter value and so that the minimum noise suppression value depends on a characteristic of the audio signal. The filter is configured for filtering the audio signal, wherein the filter is adjusted based on the noise suppression filter values.
Method for encoding multiple microphone signals into a source-separable audio signal for network transmission and an apparatus for directed source separation
A method is provided for encoding multiple microphone signals into a composite source-separable audio (SSA) signal, conducive for transmission over a voice network. The embodiments enable the processing of source separation of the target voice signal from its ambient sound to be performed at any point in the voice communication network, including the internet cloud. A multiplicity of processing is possible over the SSA signal, based on the intended voice application. The level of processing is adapted with the availability of the processing power at the chosen processing node in the network in one embodiment. An apparatus for separating out the target source voice from its ambient sound is also provided. The apparatus includes a directed source separation (DSS) unit, which processes the two virtual microphone signals in the SSA representation, to generate a new SSA signal including the enhanced target voice and the enhanced ambient noise.