Patent classifications
G10L2021/03646
Device arbitration using acoustic characteristics
Described herein is a system for device arbitration using acoustic characteristics of a physical space, such as a user's household. The system generates a matrix of inter-device attenuation factors. The inter-device attenuation factors are determined using the attenuation experienced by a first device versus the attenuation experienced by a second device. Once the matrix is generated, an attenuation vector representing the attenuation corresponding to an input audio signal is determined and compared to the matrix. Based on the comparison, the system selects a device for further processing.
Acoustic source classification using hyperset of fused voice biometric and spatial features
A method includes extracting, from multiple microphone input, a hyperset of features of acoustic sources, using the extracted features to identify separable clusters associated with acoustic scenarios, and classifying subsequent input as one of the acoustic scenarios using the hyperset of features. The acoustic scenarios include a desired spatially moving/non-moving talker, and an undesired spatially moving/non-moving acoustic source. The hyperset of features includes both spatial and voice biometric features. The classified acoustic scenario may be used in a robotics application or voice assistant device desired speech enhancement or interference signal cancellation. Specifically, the classification of the acoustic scenarios can be used to adapt a beamformer, e.g., step size adjustment. The hyperset of features may also include visual biometric features extracted from one or more cameras viewing the acoustic sources. The spatial and biometric features may be separately extracted, clustered, classified and their separate classifications fused, e.g., using frame synchronization.
CONVERSATION DEPENDENT VOLUME CONTROL
Techniques are described for detecting a conversation between at least two people, and for reducing noise during the conversation. In certain embodiments, at least one speech metric is generated based on spectral analysis of an audio signal and is used to determine that the audio signal represents speech from a first person. Responsive to determining that the speech is part of a conversation between the first person and a second person an operating state of a device in a physical environment is adjusted such that a volume level of sound contributed by or associated with the device is reduced. The sound contributed by or associated with the device corresponds to noise, at least for the duration of the conversation. Therefore, reducing the volume level of sound contributed by or associated with the device reduces the overall noise level in the environment, resulting in a reduction in conversational effort.
SPEECH SYNTHESIS IN NOISY ENVIRONMENT
Disclosed is speech synthesis in a noisy environment. According to an embodiment of the disclosure, a method of speech synthesis may generate a Lombard effect-applied synthesized speech using a feature vector generated from an utterance feature. According to the disclosure, the speech synthesis method and device may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.
Sibilance detection and mitigation
The present disclosure relates to sibilance detection and mitigation in a voice signal. A method of sibilance detection and mitigation is described. In the method, a predetermined spectrum feature is extracted from a voice signal, the predetermined spectrum feature representing a distribution of signal energy over a voice frequency band. Sibilance is then identified based on the predetermined spectrum feature. Excessive sibilance is further identified from the identified sibilance based on a level of the identified sibilance. Then the voice signal is processed by decreasing a level of the excessive sibilance so as to suppress the excessive sibilance. Corresponding system and computer program products are described as well.
LOW LATENCY AUTOMIXER INTEGRATED WITH VOICE AND NOISE ACTIVITY DETECTION
Systems and methods are disclosed for providing voice and noise activity detection with audio automixers that can reject errant non-voice or non-human noises while maximizing signal-to-noise ratio and minimizing audio latency.
Speaker enrollment
A method of speaker modelling for a speaker recognition system, comprises: receiving a signal comprising a speaker's speech; and, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort; and extracting features from the spectrum of the speaker's speech and the at least one modified spectrum. The method further comprises forming at least one speech model based on the extracted features.
BIOMETRIC USER RECOGNITION
A method of biometric user recognition comprises, in an enrolment stage, receiving first biometric data relating to a biometric identifier of the user; generating a plurality of biometric prints for the biometric identifier, based on the received first biometric data, and enrolling the user based on the plurality of biometric prints. Then, during a verification stage, the method comprises receiving second biometric data relating to the biometric identifier of the user; performing a comparison of the received second biometric data with the plurality of biometric prints; and performing user recognition based on the comparison.
System, device, and method of voice-based user authentication utilizing a challenge
Device, system, and method of voice-based user authentication utilizing a challenge. A system includes a voice-based user-authentication unit, to authenticate a user based on a voice sample uttered by the user. A voice-related challenge generator operates to generate a voice-related challenge that induces the user to modify one or more vocal properties of the user. A reaction-to-challenge detector operates to detect a user-specific vocal modification in reaction to the voice-related challenge; by using a processor as well as an acoustic microphone, an optical microphone, or a hybrid acoustic-and-optical microphone. The voice-based user-authentication unit utilizes the user-specific vocal modification, that was detected as reaction to the voice-related challenge, as part of a user-authentication process.
Device and method for adjusting speech intelligibility at an audio device
A device and method for adjusting speech intelligibility at an audio device is provided. The device comprises a microphone, a transmitter and a controller. The controller is configured to: determine a noise level at the microphone; select a voice tag, of a plurality of voice tags, based on the noise level, each of the plurality of voice tags associated with respective noise levels; determine an intelligibility rating of a mix of the voice tag and noise received at the microphone; and when the intelligibility rating is below a threshold intelligibility rating, enhance speech received the microphone based on the intelligibility rating prior to transmitting, at the transmitter, a signal representing intelligibility enhanced speech.