Patent classifications
G10L17/18
Systems and methods for determining traits based on voice analysis
Systems and methods are provided herein for determining one or more traits of a speaker based on voice analysis to present content item to the speaker. In one example, the method receives a voice query and determines whether the voice query matches within a first confidence threshold of a speaker identification (ID) among a plurality of speaker IDs stored in a speaker profile. In response to determining that the voice query matches to the speaker ID within the first confidence threshold, the method bypasses a trait prediction engine and retrieves a trait among the plurality of traits in the speaker profile associated with the matched speaker ID. The method further provides a content item based on the retrieved trait.
Systems and methods for determining traits based on voice analysis
Systems and methods are provided herein for determining one or more traits of a speaker based on voice analysis to present content item to the speaker. In one example, the method receives a voice query and determines whether the voice query matches within a first confidence threshold of a speaker identification (ID) among a plurality of speaker IDs stored in a speaker profile. In response to determining that the voice query matches to the speaker ID within the first confidence threshold, the method bypasses a trait prediction engine and retrieves a trait among the plurality of traits in the speaker profile associated with the matched speaker ID. The method further provides a content item based on the retrieved trait.
Machine learning for improving quality of voice biometrics
Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.
PROVIDING PROMPTS IN SPEECH RECOGNITION RESULTS IN REAL TIME
The present disclosure provides methods and apparatuses for providing prompts in speech recognition results in real time. A current speech input in an audio stream for a target event may be obtained. A current utterance text corresponding to the current speech input may be identified. A prompt may be generated based at least on the current utterance text, the prompt comprising at least one predicted subsequent utterance text sequence. A speech recognition result for the current speech input may be provided, the speech recognition result comprising the current utterance text and the prompt.
PROVIDING PROMPTS IN SPEECH RECOGNITION RESULTS IN REAL TIME
The present disclosure provides methods and apparatuses for providing prompts in speech recognition results in real time. A current speech input in an audio stream for a target event may be obtained. A current utterance text corresponding to the current speech input may be identified. A prompt may be generated based at least on the current utterance text, the prompt comprising at least one predicted subsequent utterance text sequence. A speech recognition result for the current speech input may be provided, the speech recognition result comprising the current utterance text and the prompt.
SYSTEM AND METHOD FOR SPEAKER VERIFICATION
A system for speaker verification is disclosed. An input receiving module receives an input audio-visual segment. An input processing module identifies one or more unlabelled speakers and one or more moments in time associated with each of the one or more unlabelled speakers in the audio-visual segment. An information extraction module extracts audio data representative of speech signal and visual data representative of facial images respectively. An input transformation module employs a first pre-trained neural network model to transform audio data of each unlabelled speaker into speaker speech space, employs a second pre-trained neural network model to transform visual data of each unlabelled speaker into speaker face space, and trains a third neural network model to match the audio data and the visual data of each unlabelled speaker with names of the labelled speakers obtained from prestored datasets. A speaker identification module identifies each unlabelled speaker with corresponding names, estimates confidence level corresponding to identification of the each unlabelled speaker from the audio-visual segment.
SYSTEM AND METHOD FOR SPEAKER VERIFICATION
A system for speaker verification is disclosed. An input receiving module receives an input audio-visual segment. An input processing module identifies one or more unlabelled speakers and one or more moments in time associated with each of the one or more unlabelled speakers in the audio-visual segment. An information extraction module extracts audio data representative of speech signal and visual data representative of facial images respectively. An input transformation module employs a first pre-trained neural network model to transform audio data of each unlabelled speaker into speaker speech space, employs a second pre-trained neural network model to transform visual data of each unlabelled speaker into speaker face space, and trains a third neural network model to match the audio data and the visual data of each unlabelled speaker with names of the labelled speakers obtained from prestored datasets. A speaker identification module identifies each unlabelled speaker with corresponding names, estimates confidence level corresponding to identification of the each unlabelled speaker from the audio-visual segment.
Speaker identification
A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.
Speaker identification
A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.
Method and apparatus for implementing speaker identification neural network
A method and apparatus for generating a speaker identification neural network include generating a first neural network that is trained to identify a first speaker with respect to a first voice signal in a first environment, generating a second neural network for identifying a second speaker with respect to a second voice signal in a second environment, and generating the speaker identification neural network by training the second neural network based on a teacher-student training model in which the first neural network is set to a teacher neural network and the second neural network is set to a student neural network.