Patent classifications
G10L2021/02087
Audio-spectral-masking-deep-neural-network crowd search
A system includes a memory having instructions therein and at least one processor in communication with the memory. The at least one processor is configured to execute the instructions to communicate, into a user device, a deep neural network comprising a predictive audio spectral mask. The at least one processor is also configured to execute the instructions to: generate data corresponding to ambient sound via a multi-microphone device; separate amplitude data and/or phase data from the data via the deep neural network comprising the predictive audio spectral mask; and determine, via the user device and based on the amplitude data and/or phase data, a location of origin of target speech relative to the user device. The at least one processor is configured to execute the instructions to display, via the user device, the location of origin of the target speech relative to the user device.
APPARATUS FOR OUTPUTTING AN AUDIO SIGNAL IN A VEHICLE CABIN
Apparatus for outputting an audio signal in a vehicle cabin comprising: at least one audio outputting device configured to output an audio signal comprising at least one audio signal component containing a human voice, particularly a singer's voice, and/or a musical instrument in a vehicle cabin; at least one audio processing device configured to process at least one audio signal output by the at least one audio outputting device so as to suppress the audio signal component in a suppression mode; at least one audio receiving device configured to receive an acoustic human voice signal, of at least one person located in the vehicle cabin whilst the audio outputting device outputs the audio signal in the vehicle cabin; and a control device configured to control operation of the audio processing device based on at least one acoustic human signal received by the audio receiving device.
AUDIO DEVICE AND OPERATION METHOD THEREOF
An audio device capable of inhibiting malfunction of an information terminal is provided. The audio device includes a sound sensor portion, a sound separation portion, a sound determination portion, and a processing portion. The sound sensor portion has a function of sensing sound. The sound separation portion has a function of separating the sound sensed by the sound sensor portion into a voice and sound other than a voice. The sound determination portion has a function of storing the feature quantity of the sound. The sound determination portion has a function of determining, with a machine learning model such as a neural network model, whether the feature quantity of the voice separated by the sound separation portion is the stored feature quantity. The processing portion has a function of analyzing an instruction contained in the voice and generating an instruction signal representing the content of the instruction in the case where the feature quantity of the voice is the stored feature quantity. The processing portion has a function of performing, on the sound other than a voice separated by the sound separation portion, processing for canceling the sound other than a voice. Specifically, the processing portion has a function of performing, on the sound other than a voice, processing for inverting the phase thereof.
APPARATUS AND METHOD FOR ACOUSTIC ECHO CANCELLATION WITH OCCLUDED VOICE SENSOR
An apparatus and method for auto echo cancellation utilizing an occluded voice sensor. The technology as disclosed and claimed herein and the various implementations and embodiments improves Acoustic Echo Cancellation (AEC) system performance by increasing cancellation quality and speech SNR by the replacement of the lower frequency portion of the microphone signal with the signal of an Occluded Voice Sensor (OVS) signal and never including the spectral band in the transmission. This frequency band replacement excludes the reinjection of that band from the speaker.
AUDIO DEVICE WITH DISTRACTOR ATTENUATOR
An audio device comprising an interface, memory, and a processor is disclosed. A first microphone input signal and a second microphone input signal is processed for provision of an output audio signal; and output the output audio signal, wherein to process the microphone signals determine a first distractor indicator based on features associated with the input signals; determine a first distractor attenuation parameter based on the first distractor indicator; determine a second distractor indicator based on one or more features associated with the first microphone input signal and the second microphone input signal; determine a second distractor attenuation parameter based on the second distractor indicator; determine an attenuator gain based on the first distractor attenuation parameter and the second gain compensation parameter; and apply a noise suppression scheme to a first beamforming output signal according to the attenuator gain for provision of the output audio signal.
TARGET SPEAKER MODE
Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.
VOICE PROCESSING APPARATUS
A voice processing apparatus includes a reception portion, a production portion and a transmission portion. The reception portion receives sound signals. The production portion produces voice data corresponding to a voice of a speaker through extraction of information of a specific frequency band from the sound signals or through removal of information of a frequency band other than the frequency band of the specific frequency band from the sound signals. The transmission portion transmits the voice data.
APPARATUS AND METHOD FOR SEPARATING VOICE SECTIONS FROM EACH OTHER
The present disclosure relates to an apparatus and method for separating voice sections from each other. Various embodiments are directed to providing an apparatus and method for separating voice sections from each other, which can maximize speaker separation performance for a short voice section by dividing a short voice section having low speaker separation reliability and separating multiple speakers from one another.
ROOM SOUNDS MODES
Example techniques described herein involve a media playback system of one or more playback devices that are operable in a plurality of modes. Operating in a given mode may enhance a use case corresponding to the mode. For instance, the plurality of modes may include a foreground mode, which may enhance active listening to the playback device. The plurality of modes may also include a background mode, which may enhance passive listening to the playback device by facilitating other activities during passive listening. In some example implementations, the plurality of modes are non-contemporary; when operating in one mode, the playback device will not be operating in the other modes, and vice versa.
Method and terminal for reconstructing speech signal, and computer storage medium
The present disclosure discloses a method performed at a terminal for reconstructing a speech signal, and a computer storage medium, and relates to the field of speech recognition. The method includes: collecting, by the terminal, a plurality of sound signals through a plurality of sensors of a microphone array; determining, by the terminal, a first speech signal in the plurality of sound signals; performing, by the terminal, signal separation on the first speech signal to obtain a second speech signal; and performing, by the terminal, reconstruction on the second speech signal through a distortion recovery model to obtain a reconstructed speech signal; the distortion recovery model being obtained by training based on a clean speech signal and a distorted speech signal. The embodiments of the present disclosure improve accuracy of speech recognition results.