Patent classifications
G10L17/02
SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATABILITY
Embodiments include a computer executing voice biometric machine-learning for speaker recognition. The machine-learning architecture includes embedding extractors that extract embeddings for enrollment or for verifying inbound speakers, and embedding convertors that convert enrollment voiceprints from a first type of embedding to a second type of embedding. The embedding convertor maps the feature vector space of the first type of embedding to the feature vector space of the second type of embedding. The embedding convertor takes as input enrollment embeddings of the first type of embedding and generates as output converted enrolled embeddings that are aggregated into a converted enrolled voiceprint of the second type of embedding. To verify an inbound speaker, a second embedding extractor generates an inbound voiceprint of the second type of embedding, and scoring layers determine a similarity between the inbound voiceprint and the converted enrolled voiceprint, both of which are the second type of embedding.
Method and apparatus with registration for speaker recognition
Disclosed is a method and apparatus with recognition for speaker recognition. The method includes determining whether an input feature vector corresponding to a voice signal of a speaker meets a candidate similarity criterion with at least one registered data included in a registration database, selectively, based on a result of the determining of whether the input feature vector meets the candidate similarity criterion, constructing a candidate list based on the input feature vector, determining whether a candidate input feature vector, among one or more candidate input feature vectors constructed in the candidate list in the selective constructing of the candidate list, meets a registration update similarity criterion with the at least one registered data, and selectively, based on a result of the determination of whether the candidate input feature vector meets the registration update similarity criterion, updating the registration database based on the candidate input feature vector.
Method and apparatus with registration for speaker recognition
Disclosed is a method and apparatus with recognition for speaker recognition. The method includes determining whether an input feature vector corresponding to a voice signal of a speaker meets a candidate similarity criterion with at least one registered data included in a registration database, selectively, based on a result of the determining of whether the input feature vector meets the candidate similarity criterion, constructing a candidate list based on the input feature vector, determining whether a candidate input feature vector, among one or more candidate input feature vectors constructed in the candidate list in the selective constructing of the candidate list, meets a registration update similarity criterion with the at least one registered data, and selectively, based on a result of the determination of whether the candidate input feature vector meets the registration update similarity criterion, updating the registration database based on the candidate input feature vector.
MICROPHONE UNIT
A microphone unit includes: an audio data acquisition unit that acquires speech as audio data; an audio data registration unit that registers verification audio data obtained by extracting a feature point from the audio data; an evaluation audio data acquisition unit that acquires speech that is input to a first microphone as evaluation audio data; a verification unit that verifies whether or not a speaker who uttered speech that is based on the evaluation audio data is a speaker who uttered speech that is based on the verification audio data, based on the verification audio data and a feature point extracted from the evaluation audio data; and a verification result output unit that outputs a result of verification performed by the verification unit.
MICROPHONE UNIT
A microphone unit includes: an audio data acquisition unit that acquires speech as audio data; an audio data registration unit that registers verification audio data obtained by extracting a feature point from the audio data; an evaluation audio data acquisition unit that acquires speech that is input to a first microphone as evaluation audio data; a verification unit that verifies whether or not a speaker who uttered speech that is based on the evaluation audio data is a speaker who uttered speech that is based on the verification audio data, based on the verification audio data and a feature point extracted from the evaluation audio data; and a verification result output unit that outputs a result of verification performed by the verification unit.
Processing speech signals in voice-based profiling
This document describes a data processing system for processing a speech signal for voice-based profiling. The data processing system segments the speech signal into a plurality of segments, with each segment representing a portion of the speech signal. For each segment, the data processing system generates a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment and determines whether the feature vector comprises data indicative of one or more features with a threshold amount of confidence. For each of a subset of the generated feature vectors, the system processes data in that feature vector to generate a prediction of a value of a profile parameter and transmits an output responsive to machine executable code that generates a visual representation of the prediction of the value of the profile parameter.
Processing speech signals in voice-based profiling
This document describes a data processing system for processing a speech signal for voice-based profiling. The data processing system segments the speech signal into a plurality of segments, with each segment representing a portion of the speech signal. For each segment, the data processing system generates a feature vector comprising data indicative of one or more features of the portion of the speech signal represented by that segment and determines whether the feature vector comprises data indicative of one or more features with a threshold amount of confidence. For each of a subset of the generated feature vectors, the system processes data in that feature vector to generate a prediction of a value of a profile parameter and transmits an output responsive to machine executable code that generates a visual representation of the prediction of the value of the profile parameter.
Detection of attachment problem of apparatus being worn by user
Provided is to prevent a false determination due to an attachment condition of an apparatus that transmits and receives an acoustic signal, and perform accurate personal authentication. A personal authentication device includes: a personal authentication means that authenticates an individual by using first information at least including an acoustic characteristic calculated from an acoustic signal propagating through the head of the user, which is detected by an apparatus being attached on a head of a user for transmitting and receiving the acoustic signal, and a feature amount extracted from the acoustic characteristic; an attachment trouble rule storage means that stores an attachment trouble rule for detecting an attachment trouble with the apparatus; and an attachment trouble detection means that detects a trouble with an attachment state of the apparatus when the first information satisfies the attachment trouble rule.
Detection of attachment problem of apparatus being worn by user
Provided is to prevent a false determination due to an attachment condition of an apparatus that transmits and receives an acoustic signal, and perform accurate personal authentication. A personal authentication device includes: a personal authentication means that authenticates an individual by using first information at least including an acoustic characteristic calculated from an acoustic signal propagating through the head of the user, which is detected by an apparatus being attached on a head of a user for transmitting and receiving the acoustic signal, and a feature amount extracted from the acoustic characteristic; an attachment trouble rule storage means that stores an attachment trouble rule for detecting an attachment trouble with the apparatus; and an attachment trouble detection means that detects a trouble with an attachment state of the apparatus when the first information satisfies the attachment trouble rule.
METHODS FOR IMPROVING THE PERFORMANCE OF NEURAL NETWORKS USED FOR BIOMETRIC AUTHENTICATIO
A method of generating a biometric signature of a user for use in authentication using a neural network, the method comprising: receiving (110) a plurality of biometric samples from a user;
extracting at least one feature vector using the plurality of biometric samples; using the elements of the at least one feature vector as inputs for a neural network; extracting the corresponding activations from an output layer of the neural network; and generating a biometric signature of the user using the extracted activations, such that a single biometric signature represents multiple biometric samples from the user.