G10L17/12

Speaker verification

A method of speaker verification comprises: comparing a test input against a model of a user's speech obtained during a process of enrolling the user; obtaining a first score from comparing the test input against the model of the user's speech; comparing the test input against a first plurality of models of speech obtained from a first plurality of other speakers respectively; obtaining a plurality of cohort scores from comparing the test input against the plurality of models of speech obtained from a plurality of other speakers; obtaining statistics describing the plurality of cohort scores; modifying said statistics to obtain adjusted statistics; normalising the first score using the adjusted statistics to obtain a normalised score; and using the normalised score for speaker verification.

AUTHENTICATING A USER

Methods of authenticating a user or speaker are provided. These methods include obtaining an input speech signal and user credentials identifying the user or speaker. The input speech signal includes a single-channel signal or a multi-channel speech signal. The methods further include extracting a speech voiceprint from the input speech signal, and retrieving a reference voiceprint associated to the user credentials. The methods still further include determining a voiceprint correspondence between the speech voiceprint and the reference voiceprint, and authenticating the user or speaker depending on said voiceprint correspondence. The methods yet further include updating the reference voiceprint depending on the speech voiceprint corresponding to the authenticated user or speaker. Computer programs, systems and computing systems are also provided which are suitable for performing said methods of authenticating a user or speaker.

COMPUTER-IMPLEMENT VOICE COMMAND AUTHENTICATION METHOD AND ELECTRONIC DEVICE

A computer-implement voice command authentication method is provided. The method includes obtaining a sound signal stream; calculating a Signal-to-Noise Ratio (SNR) value of the sound signal stream; converting the sound signal stream into a Mel-Frequency Cepstral Coefficients (MFCC) stream; calculating a Dynamic Time Warping (DTW) distance corresponding to the MFCC stream according to the MFCC stream and one of a plurality of sample streams generated by the Gaussian Mixture Model with Universal Background Model (GMM-UBM); calculating, according to the MFCC stream and the sample streams, a Log-likelihood ratio value corresponding to the MFCC stream as a GMM-UBM score; determining whether the sound signal stream passes a voice command authentication according to the GMM-UBM score, the DTW distance and the SNR value; in response to determining that the sound signal stream passes the voice command authentication, determining that the sound signal stream is a voice stream spoken from a legal user.

Automated content feedback generation system for non-native spontaneous speech

An electronic audio file is received that comprises spontaneous speech responsive to a prompt in a non-native language of a speaker. Thereafter, the electronic audio file is parsed into a plurality of spoken words. The spoken words are then normalized to remove stop words and disfluencies. At least one trained content scoring model is then used to determine an absence of pre-defined key points associated with the prompt in the normalized spoken words. A list of the determined absent key points can be generated. This list can then be displayed/caused to be displayed in a graphical user interface along with feedback to improve content completeness. Related apparatus, systems, techniques and articles are also described.

Automated content feedback generation system for non-native spontaneous speech

An electronic audio file is received that comprises spontaneous speech responsive to a prompt in a non-native language of a speaker. Thereafter, the electronic audio file is parsed into a plurality of spoken words. The spoken words are then normalized to remove stop words and disfluencies. At least one trained content scoring model is then used to determine an absence of pre-defined key points associated with the prompt in the normalized spoken words. A list of the determined absent key points can be generated. This list can then be displayed/caused to be displayed in a graphical user interface along with feedback to improve content completeness. Related apparatus, systems, techniques and articles are also described.

Speaker identification

A method of speaker identification comprises receiving a speech signal and dividing the speech signal into segments. Following each segment, a plurality of features are extracted from a most recently received segment, and scoring information is derived from the extracted features of the most recently received segment. The scoring information derived from the extracted features of the most recently received segment is combined with previously stored scoring information derived from the extracted features of any previously received segment. The new combined scoring information is stored, and an identification score is calculated using the combined scoring information.

Speaker identification

A method of speaker identification comprises receiving a speech signal and dividing the speech signal into segments. Following each segment, a plurality of features are extracted from a most recently received segment, and scoring information is derived from the extracted features of the most recently received segment. The scoring information derived from the extracted features of the most recently received segment is combined with previously stored scoring information derived from the extracted features of any previously received segment. The new combined scoring information is stored, and an identification score is calculated using the combined scoring information.

Reverberation compensation for far-field speaker recognition
11862176 · 2024-01-02 · ·

Techniques are provided for reverberation compensation for far-field speaker recognition. A methodology implementing the techniques according to an embodiment includes receiving an authentication audio signal associated with speech of a user and extracting features from the authentication audio signal. The method also includes scoring results of application of one or more speaker models to the extracted features. Each of the speaker models is trained based on a training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with that speaker model. The method further includes selecting one of the speaker models, based on the score, and mapping the selected speaker model to a known speaker identification or label that is associated with the user.

Reverberation compensation for far-field speaker recognition
11862176 · 2024-01-02 · ·

Techniques are provided for reverberation compensation for far-field speaker recognition. A methodology implementing the techniques according to an embodiment includes receiving an authentication audio signal associated with speech of a user and extracting features from the authentication audio signal. The method also includes scoring results of application of one or more speaker models to the extracted features. Each of the speaker models is trained based on a training audio signal processed by a reverberation simulator to simulate selected far-field environmental effects to be associated with that speaker model. The method further includes selecting one of the speaker models, based on the score, and mapping the selected speaker model to a known speaker identification or label that is associated with the user.

Ear-worn electronic device incorporating annoyance model driven selective active noise control

A system comprises an ear-worn electronic device configured to be worn by a wearer. The ear-worn electronic device comprises a processor and memory coupled to the processor. The memory is configured to store an annoying sound dictionary representative of a plurality of annoying sounds pre-identified by the wearer. A microphone is coupled to the processor and configured to monitor an acoustic environment of the wearer. A speaker or a receiver is coupled to the processor. The processor is configured to identify different background noises present in the acoustic environment, determine which of the background noises correspond to one or more of the plurality of annoying sounds, and attenuate the one or more annoying sounds in an output signal provided to the speaker or receiver.