G10L17/10

CROSS-LINGUAL SPEAKER RECOGNITION

Disclosed are systems and methods including computing-processes executing machine-learning architectures for voice biometrics, in which the machine-learning architecture implements one or more language compensation functions. Embodiments include an embedding extraction engine (sometimes referred to as an “embedding extractor”) that extracts speaker embeddings and determines a speaker similarity score for determine or verifying the likelihood that speakers in different audio signals are the same speaker. The machine-learning architecture further includes a multi-class language classifier that determines a language likelihood score that indicates the likelihood that a particular audio signal includes a spoken language. The features and functions of the machine-learning architecture described herein may implement the various language compensation techniques to provide more accurate speaker recognition results, regardless of the language spoken by the speaker.

CROSS-LINGUAL SPEAKER RECOGNITION

Disclosed are systems and methods including computing-processes executing machine-learning architectures for voice biometrics, in which the machine-learning architecture implements one or more language compensation functions. Embodiments include an embedding extraction engine (sometimes referred to as an “embedding extractor”) that extracts speaker embeddings and determines a speaker similarity score for determine or verifying the likelihood that speakers in different audio signals are the same speaker. The machine-learning architecture further includes a multi-class language classifier that determines a language likelihood score that indicates the likelihood that a particular audio signal includes a spoken language. The features and functions of the machine-learning architecture described herein may implement the various language compensation techniques to provide more accurate speaker recognition results, regardless of the language spoken by the speaker.

VOICE-ASSISTANT ACTIVATED VIRTUAL CARD REPLACEMENT

A device may receive a command associated with identifying a merchant for a virtual card swap procedure wherein the virtual card swap procedure is to replace a credit card of a user with a virtual card corresponding to the credit card. The device may identify the merchant for the virtual card swap procedure based on the command. The device may obtain the virtual card for the user. The device may determine a virtual card swap procedure template for the merchant. The device may perform the virtual card swap procedure based on the virtual card swap procedure template.

A COMPUTER IMPLEMENTED METHOD
20230359719 · 2023-11-09 ·

A computer-implemented method of authenticating an identity of a specific user is disclosed. The method comprises the steps of acquiring a first data set representative of a voice of a user over a time interval between a first and second time, and providing the first data set as input to a computing device. The method further comprises acquiring a second data set representative of a visual appearance of at least a portion of the user over the time interval between the first and second time, and providing the second data set as input to the computing device. The method further comprises maintaining a temporal synchronous of the first and second data sets over the time interval comparing the first and second data sets with predetermined data sets relating to the voice and visual appearance of at least a portion of the specific user, generating a confidence level in dependence of a relative correspondence of the first and second data sets with the predetermined data sets and authenticating the user as the specific user where the confidence level is above a predetermined value.

A COMPUTER IMPLEMENTED METHOD
20230359719 · 2023-11-09 ·

A computer-implemented method of authenticating an identity of a specific user is disclosed. The method comprises the steps of acquiring a first data set representative of a voice of a user over a time interval between a first and second time, and providing the first data set as input to a computing device. The method further comprises acquiring a second data set representative of a visual appearance of at least a portion of the user over the time interval between the first and second time, and providing the second data set as input to the computing device. The method further comprises maintaining a temporal synchronous of the first and second data sets over the time interval comparing the first and second data sets with predetermined data sets relating to the voice and visual appearance of at least a portion of the specific user, generating a confidence level in dependence of a relative correspondence of the first and second data sets with the predetermined data sets and authenticating the user as the specific user where the confidence level is above a predetermined value.

VEHICLE AND METHOD OF CONTROLLING THE SAME
20230382349 · 2023-11-30 ·

Systems and methods of controlling a vehicle may include performing user authentication through facial recognition of a user who may be in a state of not getting into the vehicle, receiving a voice command generated by an utterance of the user who may be in the state of not getting into the vehicle, performing voice recognition of the received voice command, and performing a vehicle control corresponding to the voice command as the result of the voice recognition.

VEHICLE AND METHOD OF CONTROLLING THE SAME
20230382349 · 2023-11-30 ·

Systems and methods of controlling a vehicle may include performing user authentication through facial recognition of a user who may be in a state of not getting into the vehicle, receiving a voice command generated by an utterance of the user who may be in the state of not getting into the vehicle, performing voice recognition of the received voice command, and performing a vehicle control corresponding to the voice command as the result of the voice recognition.

Speech interaction method and apparatus

The present invention discloses a speech interaction method and apparatus, and pertains to the field of speech processing technologies. The method includes: acquiring speech data of a user; performing user attribute recognition on the speech data to obtain a first user attribute recognition result; performing content recognition on the speech data to obtain a content recognition result of the speech data; and performing a corresponding operation according to at least the first user attribute recognition result and the content recognition result, so as to respond to the speech data. According to the present invention, after speech data is acquired, user attribute recognition and content recognition are separately performed on the speech data to obtain a first user attribute recognition result and a content recognition result, and a corresponding operation is performed according to at least the first user attribute recognition result and the content recognition result.

Speech interaction method and apparatus

The present invention discloses a speech interaction method and apparatus, and pertains to the field of speech processing technologies. The method includes: acquiring speech data of a user; performing user attribute recognition on the speech data to obtain a first user attribute recognition result; performing content recognition on the speech data to obtain a content recognition result of the speech data; and performing a corresponding operation according to at least the first user attribute recognition result and the content recognition result, so as to respond to the speech data. According to the present invention, after speech data is acquired, user attribute recognition and content recognition are separately performed on the speech data to obtain a first user attribute recognition result and a content recognition result, and a corresponding operation is performed according to at least the first user attribute recognition result and the content recognition result.

METHODS AND SYSTEMS FOR PROCESSING AUDIO SIGNALS CONTAINING SPEECH DATA
20220215844 · 2022-07-07 ·

Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A correspondence is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile indicates consent to store biometric data. If no correspondence is determined, the speech data is discarded, optionally after having been processed.