G10L17/08

END-TO-END SPEAKER RECOGNITION USING DEEP NEURAL NETWORK
20230037232 · 2023-02-02 · ·

The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

END-TO-END SPEAKER RECOGNITION USING DEEP NEURAL NETWORK
20230037232 · 2023-02-02 · ·

The present invention is directed to a deep neural network (DNN) having a triplet network architecture, which is suitable to perform speaker recognition. In particular, the DNN includes three feed-forward neural networks, which are trained according to a batch process utilizing a cohort set of negative training samples. After each batch of training samples is processed, the DNN may be trained according to a loss function, e.g., utilizing a cosine measure of similarity between respective samples, along with positive and negative margins, to provide a robust representation of voiceprints.

PROFILES FOR ENHANCED SPEECH RECOGNITION TRAINING
20220351732 · 2022-11-03 ·

In a method for improving speech analysis between devices, a processor receives a speech input comprising audio from a speech recognition platform. A processor segments the speech input into input vectors. A processor maps the input vectors to a profile. A processor calculates affinity coefficients between each input vector and the profile. A processor aggregates the input vectors and affinity coefficients in a user profile. A processor implements the user profile in a speech recognition program.

User-customized AI speaker-based personalized service system using voiceprint recognition
11488595 · 2022-11-01 · ·

Disclosed is a user-customized artificial intelligence (AI) speaker-based personalized service system using voiceprint recognition. The system is used by a small group of users. The system includes a voice recognition device that identifies each user through voice recognition and enables a voice instruction of each user to be executed, and a data processing device interconnected with the voice recognition device. The voice recognition device includes a storage unit that stores speech samples of respective registered users, a receiver that receives a first utterance of a first utterer, a determination unit that determines whether the first utterer is a registered user by comparing the first utterance of the first utterer against the speech samples of the respective registered users stored in the storage unit, and an execution that generates an instruction signal corresponding to a first instruction phrase uttered as a first voice instruction by the first utterer.

SPEAKER RECOGNITION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM

The present disclosure provides a speaker recognition method, an electronic device, and a storage medium. An implementation includes: segmenting the target audio file and the to-be-recognized audio file into a plurality of audio units respectively; extracting an audio feature from each of the audio units to obtain an audio feature sequence of the target audio file and an audio feature sequence of the to-be-recognized audio file; performing feature learning on the audio feature sequence of the target audio file and the audio feature sequence of the to-be-recognized audio file by using Siamese neural network, to obtain a feature vector corresponding to the target audio file and feature vectors respectively corresponding to the plurality of audio units in the to-be-recognized audio file; and recognizing, by using an attention mechanism-based machine learning model, the audio units belonging to the target speaker in the to-be-recognized audio file based on the feature vectors.

User identification with voiceprints on online social networks
11475344 · 2022-10-18 · ·

In one embodiment, a method includes, by one or more computing devices of an online social network, receiving, from a client system of a first user and from a second user, a biometric input used to identify the second user, sending, to the client system, a personal identifier for presentation to the second user, receiving, from the client system in response to the presentation of the personal identifier to the second user, an audio input from the second user, determining, based on a comparison of the audio input to a voiceprint of the second user, wherein the voiceprint comprises audio data for auditory identification of the second user, whether the audio input comprises the personal identifier spoken by the second user, and authenticating the second user to access an online account associated with the second user via the client system if the audio input is determined to be spoken by the second user and comprise the personal identifier spoken by the second user.

Systems and methods for detecting communication fraud attempts

The present disclosure provides a computer system, method, and computer-readable medium for a computer processor to detect, prevent and counter potentially fraudulent communications by proactively monitoring communications and performing multi-step analysis to detect fraudsters and alert communication recipients. The present disclosure may implement artificial intelligence (AI) algorithms to identify fraudulent communications. The AI model may be trained by real world examples to become more efficient.

ADVERSARIALLY ROBUST VOICE BIOMETRICS, SECURE RECOGNITION, AND IDENTIFICATION
20220328050 · 2022-10-13 ·

Techniques for detecting a fraudulent attempt by an adversarial user to voice verify as a user are presented. An authenticator component can determine characteristics of voice information received in connection with a user account based on analysis of the voice information. In response to determining the characteristics sufficiently match characteristics of a voice print associated with the user account, authenticator component can determine a similarity score based on comparing the characteristics of the voice information and other characteristics of a set of previously stored voice prints associated with the user account. Authenticator component can determine whether the similarity score is higher than a threshold similarity score to indicate whether the voice information is a replay of a recording or a deep fake emulation of the voice of the user. Above the threshold can indicate the voice information is fraudulent, and below the threshold can indicate the voice information is valid.

Voice Authentication Apparatus Using Watermark Embedding And Method Thereof
20230112622 · 2023-04-13 · ·

The present disclosure provides a voice authentication system. The voice authentication system according to an embodiment of the present disclosure includes a voice collection unit configured to collect voice information obtained by digitizing a speaker's voice, a learning model server configured to generate a voice image based on the collected voice information of the speaker, causes a deep neural network (DNN) model to learn the voice image, and extract a feature vector for the voice image, a watermark server configured to generate a watermark based on the feature vector and embed the watermark and individual information into the voice image or voice conversion data, and an authentication server configured to generate a private key based on the feature vector and determine whether to extract the watermark and the individual information based on an authentication result.

Voice Authentication Apparatus Using Watermark Embedding And Method Thereof
20230112622 · 2023-04-13 · ·

The present disclosure provides a voice authentication system. The voice authentication system according to an embodiment of the present disclosure includes a voice collection unit configured to collect voice information obtained by digitizing a speaker's voice, a learning model server configured to generate a voice image based on the collected voice information of the speaker, causes a deep neural network (DNN) model to learn the voice image, and extract a feature vector for the voice image, a watermark server configured to generate a watermark based on the feature vector and embed the watermark and individual information into the voice image or voice conversion data, and an authentication server configured to generate a private key based on the feature vector and determine whether to extract the watermark and the individual information based on an authentication result.