Patent classifications
G10L2021/02166
SOUND CROSSTALK SUPPRESSION DEVICE AND SOUND CROSSTALK SUPPRESSION METHOD
A sound crosstalk suppression device includes: a speaker analysis unit configured to analyze a speaker situation in a closed space based on voice signals respectively collected by a plurality of microphones arranged in the closed space; a filter update unit that includes a filter configured to generate a suppression signal of a crosstalk component included in a voice signal of a main speaker, that is configured to update a parameter of the filter, and that is configured to store the updated parameter in a memory; a reset unit configured to reset the parameter of the filter in a case where it is determined that an analysis result of the speaker situation is switched; and a crosstalk suppression unit configured to suppress a crosstalk component by using a suppression signal.
NOISE SUPPRESSION USING TANDEM NETWORKS
A device includes a memory configured to store instructions and one or more processors configured to execute the instructions. The one or more processors are configured to execute the instructions to receive audio data including a first audio frame corresponding to a first output of a first microphone and a second audio frame corresponding to a second output of a second microphone. The one or more processors are also configured to execute the instructions to provide the audio data to a first noise-suppression network and a second noise-suppression network. The first noise-suppression network is configured to generate a first noise-suppressed audio frame and the second noise-suppression network is configured to generate a second noise-suppressed audio frame. The one or more processors are further configured to execute the instructions to provide the noise-suppressed audio frames to an attention-pooling network. The attention-pooling network is configured to generate an output noise-suppressed audio frame.
Electronic device and controlling method using non-speech audio signal in the electronic device
An electronic device is provided. The electronic device comprises a speaker, a plurality of microphones, at least one processor operatively connected with the speaker and the plurality of microphones, and a memory operatively connected with the at least one processor, wherein the memory is configured to store instructions which, when executed, cause the at least one processor to perform speech audio processing or non-speech audio processing on audio signals received via the plurality of microphones, upon obtaining a non-speech audio signal based on the speech audio processing or the non-speech audio processing, identify a non-speech audio signal pattern corresponding to the non-speech audio signal, obtain a non-speech audio signal-based first command based on the identified non-speech audio signal pattern, and perform at least one action corresponding to the obtained non-speech audio signal-based first command.
Method for improving sound quality and electronic device using same
According to certain embodiments, an electronic device comprises a microphone configured to acquire a signal including a voice signal and noise signal; a speaker; a memory; and a processor, wherein the processor is configured to: receive the signal from the microphone, wherein the signal corresponds to a plurality of predetermined frequency bands; identify portions of the signal corresponding to a first band and a second band of the plurality of frequency bands; calculate a signal-to-noise ratio (SNR) values for each predetermined frequency band, based on the signal; obtain a first parameter for correcting the portion of the signal corresponding to the first band and a second parameter for correcting the portion of the signal corresponding to the second band, based on the calculated SNR values for the first band and the second band; and apply the first parameter and the second parameter to each of the predetermined frequency bands.
MULTI-REGISTER-BASED SPEECH DETECTION METHOD AND RELATED APPARATUS, AND STORAGE MEDIUM
This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to each sound area in N sound areas; using the sound area as a target detection sound area, and generating a control signal corresponding to the target detection sound area according to sound area information corresponding to the target detection sound area; processing a speech input signal corresponding to the target detection sound area by using the control signal corresponding to the target detection sound area, to obtain a speech output signal corresponding to the target detection sound area; and generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area. Speech signals in different directions are processed in parallel based on a plurality of sound areas, so that in a multi-sound source scenario, the speech signals in different directions may be retained or suppressed by a control signal, to separate and enhance speech of a target detection user in real time, thereby improving the accuracy of speech detection.
Acoustic output apparatus
The present disclosure provides an acoustic output apparatus including one or more status sensors, at least one low-frequency acoustic driver, at least one high-frequency acoustic driver, at least two first sound guiding holes, and at least two second sound guiding holes. The status sensors may detect status information of a user. The low-frequency acoustic driver may generate at least one first sound, a frequency of which is within a first frequency range. The high-frequency acoustic driver may generate at least one second sound, a frequency of which is within a second frequency range including at least one frequency exceeding the first frequency range. The first and second sound guiding holes may output the first and second spatial sound, respectively. The first and second sound may be generated based on the status information, and may simulate a target sound coming from at least one virtual direction with respect to the user.
SYSTEM AND METHOD FOR AUTOMATIC SETUP OF AUDIO COVERAGE AREA
Embodiments include an audio system comprising a plurality of microphones disposed in an environment, wherein the plurality of microphones is configured to detect one or more audio sources, and generate location data indicating a location of each of the one or more audio sources relative to the plurality of microphones; and at least one processor communicatively coupled to the plurality of microphones, wherein the at least one processor is configured to receive the location data from the plurality of microphones, and define a plurality of audio pick-up regions in the environment based on the location data, the plurality of audio pick-up regions comprising a first audio pick-up region and a second audio pick-up region, wherein the plurality of microphones are configured to deploy a first lobe within the first audio pick-up region and a second lobe within the second audio pick-up region.
AUDIO BEAMFORMING WITH NULLING CONTROL SYSTEM AND METHODS
Audio beamforming systems and methods that enable more precise control of lobes and nulls of an array microphone are provided. Optimized beamformer coefficients can be generated to result in beamformed signals associated with one or more lobes steered towards one or more desired sound locations and one or more nulls steered towards one or more undesired sound location. The performance of acoustic echo cancellation can be improved and enhanced.
SENSITIVITY MODE FOR AN AUDIO SPOTTING SYSTEM
An audio spotting system configured for various operating modes including a regular mode and sensitivity mode is described. An example cascade audio spotting system may include a high-power subsystem including a high-power trigger and a transfer module. This high-power trigger includes one or more detection models used to detect whether a target sound activity is included in the one or more audio streams. The one or more detection models are associated with a first set of hyperparameters when the cascade audio spotting system is in a regular mode, and the one or more detection models are associated with a second set of hyperparameters when the cascade audio spotting system is in a sensitivity mode. The transfer module provides at least one of one or more processed audio streams for further processing in response to the high-power trigger detecting the target sound activity in the one or more audio streams.
CASCADE AUDIO SPOTTING SYSTEM
Systems and methods for identifying audio events in one or more audio streams include the use of a cascade audio spotting system (such as a cascade keyword spotting system (KWS)) to reduce power consumption while maintaining a desired performance. An example cascade audio spotting system may include a first module and a high-power subsystem. The first module is to receive an audio stream from one or more audio streams, process the audio stream to detect a first target sound activity in the audio stream, and provide a first signal in response to detecting the first target sound activity in the audio stream. The high-power subsystem is to (in response to the first signal being provided by the first module) receive the one or more audio streams and process the one or more audio streams to detect a second target sound activity in the one or more audio streams.