Patent classifications
G10L21/0216
SOUND SIGNAL PROCESSING APPARATUS AND METHOD OF PROCESSING SOUND SIGNAL
A sound signal processing apparatus may include: a directional microphone configured to detect a user voice signal including a user's voice by arranging the directional microphone to face an utterance point of the user's voice; a non-directional microphone configured to detect a mixed sound signal comprising the user voice and an external sound; and a processor configured to generate an external sound signal by attenuating the user's voice from the mixed sound signal, by differentially calculating the user voice signal from the mixed sound signal.
ADAPTING SIBILANCE DETECTION BASED ON DETECTING SPECIFIC SOUNDS IN AN AUDIO SIGNAL
A method is disclosed herein for adapting parameters of a sibilance detector. Time-frequency features are extracted from an audio signal being received and. Based on those time-frequency features, a determination is made of whether the audio signal includes a short-term feature or a long-term feature. In accordance with determining that the audio signal includes the short-term feature or the long-term feature, one or more parameters of a sibilance detector for detecting sibilance in the audio signal are adapted. Sibilance in the audio signal, is detected using the sibilance detector with the one or more adapted parameters.
ADAPTING SIBILANCE DETECTION BASED ON DETECTING SPECIFIC SOUNDS IN AN AUDIO SIGNAL
A method is disclosed herein for adapting parameters of a sibilance detector. Time-frequency features are extracted from an audio signal being received and. Based on those time-frequency features, a determination is made of whether the audio signal includes a short-term feature or a long-term feature. In accordance with determining that the audio signal includes the short-term feature or the long-term feature, one or more parameters of a sibilance detector for detecting sibilance in the audio signal are adapted. Sibilance in the audio signal, is detected using the sibilance detector with the one or more adapted parameters.
Asynchronous ad-hoc distributed microphone array processing in smart home applications using voice biometrics
Voice biometrics scoring is performed on received asynchronous audio outputs from microphones distributed at ad hoc locations to generate confidence scores that indicate a likelihood of an enrolled user speech utterance in the output, a subset of the outputs is selected based on the confidence scores, and the subset is spatially processed to provide audio output for voice application use. Alternatively, asynchronous spatially processed audio outputs and corresponding biometric identifiers are received from corresponding devices distributed at ad hoc locations, audio frames of the outputs are synchronized using the biometric identifiers, and the synchronized frames are coherently combined. Alternatively, uttered speech associated with respective ad hoc distributed devices is received and non-coherently combined to generate a final output of uttered speech. The uttered speech is recognized from respective spatially processed outputs generated by the respective devices using biometrics of talkers enrolled by the devices.
Speech processing apparatus and method using a plurality of microphones
A speech processing apparatus includes a plurality of microphones configured to receive a plurality of input signals, and processing circuitry configured to generate a spatial filtering signal corresponding to the plurality of input signals through spatial filtering, generate estimated noise information by integrating directional noise information representing a level of a noise signal received from a direction of interest with diffuse noise information representing levels of noise signals received from various directions based on whether the plurality of input signals have directionality, and generate an estimated speech signal by filtering the spatial filtering signal based on the estimated noise information.
Microphone array based deep learning for time-domain speech signal extraction
A device for processing audio signals in a time-domain includes a processor configured to receive multiple audio signals corresponding to respective microphones of at least two or more microphones of the device, at least one of the multiple audio signals comprising speech of a user of the device. The processor is configured to provide the multiple audio signals to a machine learning model, the machine learning model having been trained based at least in part on an expected position of the user of the device and expected positions of the respective microphones on the device. The processor is configured to provide an audio signal that is enhanced with respect to the speech of the user relative to the multiple audio signals, wherein the audio signal is a waveform output from the machine learning model.
VOICE WAKE-UP METHOD AND ELECTRONIC DEVICE
A voice wake-up method is provided. The method includes: collecting a first voice signal in an environment in which the first electronic device is located; If audio is being played in the environment when the first voice signal is collected, obtaining in a wired or wireless communication manner, an audio signal corresponding to the audio, determining a first false wake-up result based on the first voice signal and the audio signal; receiving a second false wake-up result sent by the second electronic device, determining a third false wake-up result based on the first false wake-up result and the second false wake-up result; wherein the third false wake-up result is used to indicate whether a wake-up operation needs to be performed on a to-be-woken-up device in a local area network; sending the third false wake-up result to another electronic device other than the first electronic device in the local area network.
VOICE WAKE-UP METHOD AND ELECTRONIC DEVICE
A voice wake-up method is provided. The method includes: collecting a first voice signal in an environment in which the first electronic device is located; If audio is being played in the environment when the first voice signal is collected, obtaining in a wired or wireless communication manner, an audio signal corresponding to the audio, determining a first false wake-up result based on the first voice signal and the audio signal; receiving a second false wake-up result sent by the second electronic device, determining a third false wake-up result based on the first false wake-up result and the second false wake-up result; wherein the third false wake-up result is used to indicate whether a wake-up operation needs to be performed on a to-be-woken-up device in a local area network; sending the third false wake-up result to another electronic device other than the first electronic device in the local area network.
Voice interaction method, device, apparatus and server
A voice interaction method is provided. The method is applied to a wearable set and includes: collecting voice information through at least two microphones; processing the voice information and determining that the voice information comprises an effective voice instruction; wherein the effective voice instruction is issued by a user for a mobile terminal; and transmitting the effective voice instruction to the mobile terminal. In an embodiment, the processing of the voice information is assigned to an external device, which reduces the power consumption of a mobile terminal; and voice information is collected by at least two microphones to improve an efficiency and quality of a voice collection.
Voice interaction method, device, apparatus and server
A voice interaction method is provided. The method is applied to a wearable set and includes: collecting voice information through at least two microphones; processing the voice information and determining that the voice information comprises an effective voice instruction; wherein the effective voice instruction is issued by a user for a mobile terminal; and transmitting the effective voice instruction to the mobile terminal. In an embodiment, the processing of the voice information is assigned to an external device, which reduces the power consumption of a mobile terminal; and voice information is collected by at least two microphones to improve an efficiency and quality of a voice collection.