G10L21/0208

DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS

A method, computer program product, and computing system for obtaining one or more speech signals from a first device, thus defining one or more first device speech signals. One or more speech signals may be obtained from a second device, thus defining one or more second device speech signals. One or more acoustic relative transfer functions mapping reverberation from the one or more first device speech signals to the one or more second device speech signals may be generated. One or more augmented second device speech signals may be generated based upon, at least in part, the one or more acoustic relative transfer functions and first device training data.

Wind noise mitigation systems and methods

A system and method can provide nose, such as wind noise, mitigation and/or microphone blending. Some methods may include sampling a sound signal from a plurality of microphones to generate a frame comprising a plurality of time-frequency tiles of the sound signal, each time-frequency tile including respective values of at least one feature from the plurality of microphones, comparing the respective values of the at least one feature to determine whether each time-frequency tile satisfies a similarity threshold, and flagging each time-frequency tile as noise if it fails to satisfy the similarity threshold, grouping the plurality of time-frequency tiles into sets of frequency-adjacent time-frequency tiles, and for each set of frequency-adjacent time-frequency tiles in the frame: counting a number of flagged time-frequency tiles, and attenuating all of the time-frequency tiles in the each set if the number exceeds a noise bin count threshold to thereby reduce noise in the sound signal.

Wind noise mitigation systems and methods

A system and method can provide nose, such as wind noise, mitigation and/or microphone blending. Some methods may include sampling a sound signal from a plurality of microphones to generate a frame comprising a plurality of time-frequency tiles of the sound signal, each time-frequency tile including respective values of at least one feature from the plurality of microphones, comparing the respective values of the at least one feature to determine whether each time-frequency tile satisfies a similarity threshold, and flagging each time-frequency tile as noise if it fails to satisfy the similarity threshold, grouping the plurality of time-frequency tiles into sets of frequency-adjacent time-frequency tiles, and for each set of frequency-adjacent time-frequency tiles in the frame: counting a number of flagged time-frequency tiles, and attenuating all of the time-frequency tiles in the each set if the number exceeds a noise bin count threshold to thereby reduce noise in the sound signal.

Electronic device and controlling method using non-speech audio signal in the electronic device
11562741 · 2023-01-24 · ·

An electronic device is provided. The electronic device comprises a speaker, a plurality of microphones, at least one processor operatively connected with the speaker and the plurality of microphones, and a memory operatively connected with the at least one processor, wherein the memory is configured to store instructions which, when executed, cause the at least one processor to perform speech audio processing or non-speech audio processing on audio signals received via the plurality of microphones, upon obtaining a non-speech audio signal based on the speech audio processing or the non-speech audio processing, identify a non-speech audio signal pattern corresponding to the non-speech audio signal, obtain a non-speech audio signal-based first command based on the identified non-speech audio signal pattern, and perform at least one action corresponding to the obtained non-speech audio signal-based first command.

Method for improving sound quality and electronic device using same

According to certain embodiments, an electronic device comprises a microphone configured to acquire a signal including a voice signal and noise signal; a speaker; a memory; and a processor, wherein the processor is configured to: receive the signal from the microphone, wherein the signal corresponds to a plurality of predetermined frequency bands; identify portions of the signal corresponding to a first band and a second band of the plurality of frequency bands; calculate a signal-to-noise ratio (SNR) values for each predetermined frequency band, based on the signal; obtain a first parameter for correcting the portion of the signal corresponding to the first band and a second parameter for correcting the portion of the signal corresponding to the second band, based on the calculated SNR values for the first band and the second band; and apply the first parameter and the second parameter to each of the predetermined frequency bands.

Presence detection using ultrasonic signals with concurrent audio playback

Techniques for presence-detection devices to detect movement of a person in an environment by emitting ultrasonic signals using a loudspeaker that is concurrently outputting audible sound. To detect movement by the person, the devices characterize the change in the frequency, or the Doppler shift, of the reflections of the ultrasonic signals off the person caused by the movement of the person. However, when a loudspeaker plays audible sound while emitting the ultrasonic signal, audio signals generated by microphones of the devices include distortions caused by the loudspeaker. These distortions can be interpreted by the presence-detection devices as indicating movement of a person when there is no movement, or as indicating lack of movement when a user is moving. The techniques include processing audio signals to remove distortions to more accurately identify changes in the frequency of the reflections of the ultrasonic signals caused by the movement of the person.

MULTI-REGISTER-BASED SPEECH DETECTION METHOD AND RELATED APPARATUS, AND STORAGE MEDIUM

This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to each sound area in N sound areas; using the sound area as a target detection sound area, and generating a control signal corresponding to the target detection sound area according to sound area information corresponding to the target detection sound area; processing a speech input signal corresponding to the target detection sound area by using the control signal corresponding to the target detection sound area, to obtain a speech output signal corresponding to the target detection sound area; and generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area. Speech signals in different directions are processed in parallel based on a plurality of sound areas, so that in a multi-sound source scenario, the speech signals in different directions may be retained or suppressed by a control signal, to separate and enhance speech of a target detection user in real time, thereby improving the accuracy of speech detection.

Mixed adaptive and fixed coefficient neural networks for speech enhancement

Systems, methods and computer-readable media are provided for speech enhancement using a hybrid neural network. An example process can include receiving, by a first neural network portion of the hybrid neural network, audio data and reference data, the audio data including speech data, noise data, and echo data; filtering, by the first neural network portion, a portion of the audio data based on adapted coefficients of the first neural network portion, the portion of the audio data including the noise data and/or echo data; based on the filtering, generating, by the first neural network portion, filtered audio data including the speech data and an unfiltered portion of the noise data and/or echo data; and based on the filtered audio data and the reference data, extracting, by a second neural network portion of the hybrid neural network, the speech data from the filtered audio data.

Mixed adaptive and fixed coefficient neural networks for speech enhancement

Systems, methods and computer-readable media are provided for speech enhancement using a hybrid neural network. An example process can include receiving, by a first neural network portion of the hybrid neural network, audio data and reference data, the audio data including speech data, noise data, and echo data; filtering, by the first neural network portion, a portion of the audio data based on adapted coefficients of the first neural network portion, the portion of the audio data including the noise data and/or echo data; based on the filtering, generating, by the first neural network portion, filtered audio data including the speech data and an unfiltered portion of the noise data and/or echo data; and based on the filtered audio data and the reference data, extracting, by a second neural network portion of the hybrid neural network, the speech data from the filtered audio data.

SOUND OUTPUT CONTROL DEVICE, SOUND OUTPUT SYSTEM, SOUND OUTPUT CONTROL METHOD, AND COMPUTER-READABLE STORAGE MEDIUM

A sound output control device includes: an orientation detecting unit configured to detect a state of orientation of a face of a user; an ambient sound obtaining unit configured to obtain ambient sound; an ambient sound reducing processing unit configured to perform, based on the ambient sound, processing of reducing the ambient sound; and a sound output control unit configured to cause sound to be output with the ambient sound reduced by the ambient sound reducing processing unit, when the detected orientation of the face of the user is in a first state, and make audibility of the ambient sound higher than in a state where the ambient sound has been reduced by the ambient sound reducing processing unit, when the detected orientation of the face of the user is in a second state changed from the first state.