Patent classifications
G10L2021/02161
Voice acquisition control method and device, and TWS earphones
A method for speech collection control applied to a master earphone is provided. The method includes: activating a microphone of the master earphone to collect noise and transmitting an activating instruction to a slave earphone, when a user speech is detected, so that the slave earphone controls a microphone of the slave earphone to collect noise in response to the activating instruction; determining an earphone located in an environment with lower noise based on noise data collected by the master earphone and noise data collected by the slave earphone; and controlling a microphone of the earphone located in the environment with lower noise to collect the user speech. A method for speech collection control applied to a slave earphone is further provided.
GENERAL SPEECH ENHANCEMENT METHOD AND APPARATUS USING MULTI-SOURCE AUXILIARY INFORMATION
The present disclosure discloses a general speech enhancement method and apparatus using multi-source auxiliary information. The method includes following steps: S1: building a training data set; S2: using the training data set to learn network parameters of a model, and building a speech enhancement model; S3: building a sound source information database in a pre-collection or on-site collection mode; S4: acquiring an input of the speech enhancement model; and S5: taking a noisy original signal as a main input of the speech enhancement model, taking auxiliary sound signals of a target source group and auxiliary sound signals of an interference source group as side inputs of the speech enhancement model for speech enhancement, and obtaining an enhanced speech signal.
VOICE-CONTROLLED DISPLAY DEVICE AND METHOD FOR EXTRACTING VOICE SIGNALS
A voice-controlled display device comprises a display panel, a signal input port, two microphones, a microprocessor and a display controller. The signal input port is configured to receive a first video signal from a host. Each of the microphone comprises a sound-receiving terminal for receiving an external audio, wherein the sound-receiving terminal is disposed adjacent to the display panel and the sound-receiving terminal and the display panel are located on the same side of the voice-controlled display device. The microprocessor electrically connects to the microphones and the microprocessor performs a voice recognition procedure to obtain an instruction according to the external audio. The display controller electrically connects to the signal input port, the display panel and the microprocessor, wherein the display controller transforms the first video signal to a second video signal and the display panel display one of the first video signal and the second video signal.
Linear Filtering for Noise-Suppressed Speech Detection
Systems and methods for suppressing noise and detecting voice input in a multi-channel audio signal captured by a plurality of microphones include (i) capturing a first audio signal via a first microphone and a second audio signal via a second microphone, wherein the first and second audio signals respectively comprises first and second noise content from a noise source; (ii) identifying the first noise content in the first audio signal; (iii) using the identified first noise content to determine an estimated noise content captured by the plurality of microphones; (iv) using the estimated noise content to suppress the first and second noise content in the first and second audio signals; (v) combining the suppressed first and second audio signals into a third audio signal; and (vi) determining that the third audio signal includes a voice input comprising a wake word.
EARBUD SPEECH ESTIMATION
Embodiments of the invention determine a speech estimate using a bone conduction sensor or accelerometer, without employing voice activity detection gating of speech estimation. Speech estimation is based either exclusively on the bone conduction signal, or is performed in combination with a microphone signal. The speech estimate is then used to condition an output signal of the microphone. There are multiple use cases for speech processing in audio devices.
Pre-selectable and dynamic configurable multistage echo control system for large range level of acoustic echo
A method for controlling echo of a voice recognition system includes receiving at least two audio input signals corresponding to sound sensed by at least two microphones in a physical space. A first audio input signal of the at least two audio input signals is received on a primary channel, and each remaining audio input signal is received through a respective secondary channel. The method includes selecting, by a processor, based on an echo power level of a speaker in the physical space, a subset of echo control functions (ECFs) from among a plurality of ECFs of a multistage echo control system. Each ECF modifies the at least two audio input signals to reduce echo. The method includes generating a corresponding number of audio output signals by processing the signals received on the primary and secondary channels through the selected subset of ECFs, and outputting the audio output signals.
ADAPTIVE NULLFORMING FOR SELECTIVE AUDIO PICK-UP
Audio pickup systems and methods are provided to enhance an audio signal by removing noise components related to an acoustic environment. The systems and methods receive a primary signal and a reference signal. The reference signal is adaptively filtered and subtracted from the primary signal to minimize an energy content of a resulting output signal.
VOICE PROCESSING METHOD AND ELECTRONIC DEVICE
A voice processing method is provided. The method includes: An electronic device first performs de-reverberation processing on a first frequency domain signal to obtain a second frequency domain signal, performs noise reduction processing on the first frequency domain signal to obtain a third frequency domain signal, and then performs, based on a first voice feature of the second frequency domain signal and a second voice feature of the third frequency domain signal, fusion processing on the second frequency domain signal and the third frequency domain signal that belong to a same channel of first frequency domain signal, to obtain a fused frequency domain signal. In this case, background noise in the fused frequency domain signal is not damaged, thereby effectively ensuring stable background noise of a voice signal obtained after voice processing. In addition, an electronic device, a chip system, and a computer-readable storage medium are provided.
PRE-SELECTABLE AND DYNAMIC CONFIGURABLE MULTISTAGE ECHO CONTROL SYSTEM FOR LARGE RANGE LEVEL OF ACOUSTIC ECHO
A method for controlling echo of a voice recognition system includes receiving at least two audio input signals corresponding to sound sensed by at least two microphones in a physical space. A first audio input signal of the at least two audio input signals is received on a primary channel, and each remaining audio input signal is received through a respective secondary channel. The method includes selecting, by a processor, based on an echo power level of a speaker in the physical space, a subset of echo control functions (ECFs) from among a plurality of ECFs of a multistage echo control system. Each ECF modifies the at least two audio input signals to reduce echo. The method includes generating a corresponding number of audio output signals by processing the signals received on the primary and secondary channels through the selected subset of ECFs, and outputting the audio output signals.
Outputting notifications using device groups
A system that determines that devices are co-located in an acoustic region and selects a single device to which to send incoming notifications for the acoustic region. The system may group devices into separate acoustic regions based on selection data that selects between similar audio data received from multiple devices. The system may select the best device for each acoustic region based on a frequency that the device was selected previously, input/output capabilities of the device, a proximity to a user, or the like. The system may send a notification to a single device in each of the acoustic regions so that a user receives a single notification instead of multiple unsynchronized notifications. The system may also determine that acoustic regions are associated with different locations and select acoustic regions to which to send a notification based on location.