Patent classifications
G10L17/10
METHODS AND SYSTEMS FOR PROCESSING AUDIO SIGNALS CONTAINING SPEECH DATA
Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A correspondence is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile indicates consent to store biometric data. If no correspondence is determined, the speech data is discarded, optionally after having been processed.
VOICEPRINT RECOGNITION METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM
A voiceprint recognition method includes: obtaining a target speech information set to be recognized that includes speech information corresponding to at least one object; extracting target feature information from the target speech information set by using a preset algorithm, and optimizing the target feature information based on a first loss function to obtain a first voiceprint recognition result; obtaining target speech channel information of a target speech channel, where the target speech channel information includes channel noise information, and the target speech channel is used to transmit the target speech information set; extracting target feature vectors in the channel noise information, and optimizing the target feature vectors based on a second loss function to obtain a second voiceprint recognition result; and fusing the first voiceprint recognition result and the second voiceprint recognition result to determine a final voiceprint recognition result.
VOICEPRINT RECOGNITION METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM
A voiceprint recognition method includes: obtaining a target speech information set to be recognized that includes speech information corresponding to at least one object; extracting target feature information from the target speech information set by using a preset algorithm, and optimizing the target feature information based on a first loss function to obtain a first voiceprint recognition result; obtaining target speech channel information of a target speech channel, where the target speech channel information includes channel noise information, and the target speech channel is used to transmit the target speech information set; extracting target feature vectors in the channel noise information, and optimizing the target feature vectors based on a second loss function to obtain a second voiceprint recognition result; and fusing the first voiceprint recognition result and the second voiceprint recognition result to determine a final voiceprint recognition result.
AUTOMATIC GENERATION AND/OR USE OF TEXT-DEPENDENT SPEAKER VERIFICATION FEATURES
Implementations relate to automatic generation of speaker features for each of one or more particular text-dependent speaker verifications (TD-SVs) for a user. Implementations can generate speaker features for a particular TD-SV using instances of audio data that each capture a corresponding spoken utterance of the user during normal non-enrollment interactions with an automated assistant via one or more respective assistant devices. For example, a portion of an instance of audio data can be used in response to: (a) determining that recognized term(s) for the spoken utterance captured by that the portion correspond to the particular TD-SV; and (b) determining that an authentication measure, for the user and for the spoken utterance, satisfies a threshold. Implementations additionally or alternatively relate to utilization of speaker features, for each of one or more particular TD-SVs for a user, in determining whether to authenticate a spoken utterance for the user.
AUTHENTICATING RECEIVED SPEECH
A speech signal is received by a device comprising first and second transducers, and the first transducer comprises a microphone. A method comprises performing a first voice biometric process on speech contained in a first part of a signal received by the microphone, in order to determine whether the speech is the speech of an enrolled user. A first correlation is determined, between said first part of the signal received by the microphone and a corresponding part of the signal received by the second transducer. A second correlation is determined, between said second part of the signal received by the microphone and the corresponding part of the signal received by the second transducer. It is then determined whether the first correlation and the second correlation satisfy a predetermined condition. If it is determined that the speech contained in the first part of the received signal is the speech of an enrolled user and that the first correlation and the second correlation satisfy the predetermined condition, the received speech signal is authenticated.
COLLABORATION APPLICATION INTEGRATION FOR USER-IDENTITY VERIFICATION
Disclosed are methods, systems, and non-transitory computer-readable media for utilizes a collaboration application to provide data beneficial to the authentication of the user. The present application discloses receiving at least one item of personal identifying information for a user from a primary multi-factor authentication device. The present application further discloses receiving at least one item of personal identifying information for a user from a conferencing service in which the user is engaged in a conference. The present application also discloses determining whether to authenticate the user based on the items of personal identifying information from the primary multi-factor authentication device and from the conferencing service.
SPEAKER IDENTIFICATION
A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.
SPEAKER IDENTIFICATION
A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.
Methods and systems for processing audio signals containing speech data
Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A match is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile associated with a record indicating consent to store biometric data. If a match is determined to exist with such a profile, the speech data is stored in an archive after processing. If no such match is determined, or if the extracted biometric data includes data from a speaker not having a consenting user profile, the speech data is discarded, optionally after having been processed. The system and method provides a safeguard against transferring to storage data of users, particularly minors or children, for whom a verified and valid consent has not been obtained from an authorised adult.
Methods and systems for processing audio signals containing speech data
Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A match is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile associated with a record indicating consent to store biometric data. If a match is determined to exist with such a profile, the speech data is stored in an archive after processing. If no such match is determined, or if the extracted biometric data includes data from a speaker not having a consenting user profile, the speech data is discarded, optionally after having been processed. The system and method provides a safeguard against transferring to storage data of users, particularly minors or children, for whom a verified and valid consent has not been obtained from an authorised adult.