G10L17/04

SYSTEM AND METHOD FOR AUGMENTED AUTHENTICATION USING ACOUSTIC DEVICES
20230216845 · 2023-07-06 · ·

Systems, methods, and computer program products are provided for augmented authentication using acoustic devices. The method includes receiving a transfer request including an NFT identifier from one of one or more acoustic devices. The NFT identifier corresponds to an acoustic device NFT associated with the given acoustic device and a device user. The method includes comparing the NFT identifier with one or more stored NFT identifiers to determine the given acoustic device associated with the NFT identifier. The method further includes confirming that the identity of the voice command user matches the device user associated with the acoustic device. The method still further includes causing an authentication of the transfer request upon confirming the acoustic device is associated with the voice command user.

SYSTEM AND METHOD FOR REAL-TIME FRAUD DETECTION IN VOICE BIOMETRIC SYSTEMS USING PHONEMES IN FRAUDSTER VOICE PRINTS
20230214850 · 2023-07-06 · ·

A system and method for real-time fraud detection with a social engineering phoneme (SEP) watchlist of phoneme sequences may perform real-time fraud prevention operations including receiving incoming call interactions and grouping the call interactions into one or more clusters, each cluster associated with a speaker's voice based on voiceprints. For a pair of voiceprints in a cluster, a phoneme sequence is extracted for each voice print. From the extracted phoneme sequences, a similarity score is then calculated to determine if a match exists between the extracted phoneme sequences based on a threshold. If determined a match exists, the phoneme sequence may be added to a SEP watchlist.

Methods and systems for processing audio signals containing speech data
11694693 · 2023-07-04 · ·

Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A correspondence is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile indicates consent to store biometric data. If no correspondence is determined, the speech data is discarded, optionally after having been processed.

Methods and systems for processing audio signals containing speech data
11694693 · 2023-07-04 · ·

Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A correspondence is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile indicates consent to store biometric data. If no correspondence is determined, the speech data is discarded, optionally after having been processed.

Wakeword detection

Techniques for processing incoming audio using multiple wakeword detectors are described. Audio data representing an utterance may be processed by different wakeword detectors that can detect different wakewords and are associated with different speech processing components. When a wakeword is detected by one of the wakeword detectors, it may be processed by the corresponding speech processing component.

Method and apparatus for implementing speaker identification neural network

A method and apparatus for generating a speaker identification neural network include generating a first neural network that is trained to identify a first speaker with respect to a first voice signal in a first environment, generating a second neural network for identifying a second speaker with respect to a second voice signal in a second environment, and generating the speaker identification neural network by training the second neural network based on a teacher-student training model in which the first neural network is set to a teacher neural network and the second neural network is set to a student neural network.

Method and apparatus for implementing speaker identification neural network

A method and apparatus for generating a speaker identification neural network include generating a first neural network that is trained to identify a first speaker with respect to a first voice signal in a first environment, generating a second neural network for identifying a second speaker with respect to a second voice signal in a second environment, and generating the speaker identification neural network by training the second neural network based on a teacher-student training model in which the first neural network is set to a teacher neural network and the second neural network is set to a student neural network.

SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATABILITY
20230005486 · 2023-01-05 · ·

Embodiments include a computer executing voice biometric machine-learning for speaker recognition. The machine-learning architecture includes embedding extractors that extract embeddings for enrollment or for verifying inbound speakers, and embedding convertors that convert enrollment voiceprints from a first type of embedding to a second type of embedding. The embedding convertor maps the feature vector space of the first type of embedding to the feature vector space of the second type of embedding. The embedding convertor takes as input enrollment embeddings of the first type of embedding and generates as output converted enrolled embeddings that are aggregated into a converted enrolled voiceprint of the second type of embedding. To verify an inbound speaker, a second embedding extractor generates an inbound voiceprint of the second type of embedding, and scoring layers determine a similarity between the inbound voiceprint and the converted enrolled voiceprint, both of which are the second type of embedding.

SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATABILITY
20230005486 · 2023-01-05 · ·

Embodiments include a computer executing voice biometric machine-learning for speaker recognition. The machine-learning architecture includes embedding extractors that extract embeddings for enrollment or for verifying inbound speakers, and embedding convertors that convert enrollment voiceprints from a first type of embedding to a second type of embedding. The embedding convertor maps the feature vector space of the first type of embedding to the feature vector space of the second type of embedding. The embedding convertor takes as input enrollment embeddings of the first type of embedding and generates as output converted enrolled embeddings that are aggregated into a converted enrolled voiceprint of the second type of embedding. To verify an inbound speaker, a second embedding extractor generates an inbound voiceprint of the second type of embedding, and scoring layers determine a similarity between the inbound voiceprint and the converted enrolled voiceprint, both of which are the second type of embedding.

Speaker diartzation using an end-to-end model
11545157 · 2023-01-03 · ·

Techniques are described for training and/or utilizing an end-to-end speaker diarization model. In various implementations, the model is a recurrent neural network (RNN) model, such as an RNN model that includes at least one memory layer, such as a long short-term memory (LSTM) layer. Audio features of audio data can be applied as input to an end-to-end speaker diarization model trained according to implementations disclosed herein, and the model utilized to process the audio features to generate, as direct output over the model, speaker diarization results. Further, the end-to-end speaker diarization model can be a sequence-to-sequence model, where the sequence can have variable length. Accordingly, the model can be utilized to generate speaker diarization results for any of various length audio segments.