Patent classifications
G10L2021/02165
Speaker diartzation using an end-to-end model
Techniques are described for training and/or utilizing an end-to-end speaker diarization model. In various implementations, the model is a recurrent neural network (RNN) model, such as an RNN model that includes at least one memory layer, such as a long short-term memory (LSTM) layer. Audio features of audio data can be applied as input to an end-to-end speaker diarization model trained according to implementations disclosed herein, and the model utilized to process the audio features to generate, as direct output over the model, speaker diarization results. Further, the end-to-end speaker diarization model can be a sequence-to-sequence model, where the sequence can have variable length. Accordingly, the model can be utilized to generate speaker diarization results for any of various length audio segments.
Directional acoustic sensor and electronic device including the same
Provided are a directional acoustic sensor that detects a direction of sound, a method of detecting a direction of sound, and an electronic device including the directional acoustic sensor. The directional acoustic sensor includes a sound inlet through which a sound is received, a sound outlet through which the sound received through the sound inlet is output, and a plurality of vibration bodies arranged between the sound inlet and the sound outlet, in which one or more of the plurality of vibration bodies selectively react to the sound received by the sound inlet according to a direction of the received sound.
Deep learning based noise reduction method using both bone-conduction sensor and microphone signals
A deep learning speech extraction and noise reduction method fusing signals of bone vibration sensor and microphone comprises steps of a bone vibration sensor and a microphone collecting audio signals to respectively obtain a bone vibration sensor audio signal and a microphone audio signal; inputting the bone vibration sensor audio signal into a high-pass filter module and performing high-pass filtering; inputting the bone vibration sensor audio signal subjected to high-pass filtering or a signal subjected to frequency band broadening, and the microphone audio signal into a DNN module; and the DNN model obtaining subjects by prediction and the subjects are subjected to fusing and noise reduction. By combining signals of bone vibration sensor and traditional microphone, the invention uses modeling of the DNN to realize high vocal reproduction and noise suppression. Signal obtained by performing frequency band broadening on a bone vibration sensor audio signal is used as output.
SOUND SIGNAL PROCESSING APPARATUS AND METHOD OF PROCESSING SOUND SIGNAL
A sound signal processing apparatus may include: a directional microphone configured to detect a user voice signal including a user's voice by arranging the directional microphone to face an utterance point of the user's voice; a non-directional microphone configured to detect a mixed sound signal comprising the user voice and an external sound; and a processor configured to generate an external sound signal by attenuating the user's voice from the mixed sound signal, by differentially calculating the user voice signal from the mixed sound signal.
SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS
An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.
APPARATUS AND METHOD
An apparatus includes a CPU and a memory storing a program that causes the apparatus to function as the following units. A first amplification unit that amplifies a the sound signal from a first microphone for acquiring an environment sound, a second amplification unit that amplifies a sound signal from a second microphone for acquiring a noise of a noise source in accordance with a amplification amount, a conversion unit that performs Fourier transform on sound signals from the first amplification unit and the second amplification unit, a reduction unit that reduces noise from first sound data using noise data. The amplification amount is set based on at least one of a level of the sound signal from the second amplification unit and a type of the noise source.
SOUND PROCESSING APPARATUS AND CONTROL METHOD
A sound processing apparatus includes a first microphone that acquires an environmental sound, a second microphone that acquires noise from a noise source, a first conversion unit that performs Fourier transform on a sound signal from the first microphone to generate first sound data, a second conversion unit that performs Fourier transform on a sound signal from the second microphone to generate second sound data, a first reduction unit that reduces noise from the noise source in the first sound data using noise data, a detection unit detects, based on the second sound data, that short-term noise from the noise source is included in the first sound data, a second reduction unit that controls a magnitude of sound data from the first reduction unit and reduces the short-term noise, and a third conversion unit that performs inverse Fourier transform on sound data from the second reduction unit.
CONFERENCE ROOM SYSTEM AND AUDIO PROCESSING METHOD
An audio processing method includes the following steps of capturing audio data by a microphone array to compute frequency array data of the audio data; computing a power sequence of degrees by using the frequency array data; and computing a difference value between a maximum value of the power sequence of degrees and a minimum value of the power sequence of degrees to determine whether the degree corresponding to the maximum value is a source degree relative to the microphone array.
Voice interaction method, device, apparatus and server
A voice interaction method is provided. The method is applied to a wearable set and includes: collecting voice information through at least two microphones; processing the voice information and determining that the voice information comprises an effective voice instruction; wherein the effective voice instruction is issued by a user for a mobile terminal; and transmitting the effective voice instruction to the mobile terminal. In an embodiment, the processing of the voice information is assigned to an external device, which reduces the power consumption of a mobile terminal; and voice information is collected by at least two microphones to improve an efficiency and quality of a voice collection.
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
Disclosed is an information processing apparatus including a control section that estimates utterance environment information in a state where the control section is set to cooperate with a predetermined mobile terminal.