G10L17/08

LIMITING IDENTITY SPACE FOR VOICE BIOMETRIC AUTHENTICATION

Disclosed are systems and methods including computing-processes executing machine-learning architectures extract vectors representing disparate types of data and output predicted identities of users accessing computing services, without express identity assertions, and across multiple computing services, analyzing data from multiple modalities, for various user devices, and agnostic to architectures hosting the disparate computing service. The system invokes the identification operations of the machine-learning architecture, which extracts biometric embeddings from biometric data and context embeddings representing all or most of the types of metadata features analyzed by the system. The context embeddings help identify a subset of potentially matching identities of possible users, which limits the number of biometric-prints the system compares against an inbound biometric embedding for authentication. The types of extracted features originate from multiple modalities, including metadata from data communications, audio signals, and images. In this way, the embodiments apply a multi-modality machine-learning architecture.

LIMITING IDENTITY SPACE FOR VOICE BIOMETRIC AUTHENTICATION

Disclosed are systems and methods including computing-processes executing machine-learning architectures extract vectors representing disparate types of data and output predicted identities of users accessing computing services, without express identity assertions, and across multiple computing services, analyzing data from multiple modalities, for various user devices, and agnostic to architectures hosting the disparate computing service. The system invokes the identification operations of the machine-learning architecture, which extracts biometric embeddings from biometric data and context embeddings representing all or most of the types of metadata features analyzed by the system. The context embeddings help identify a subset of potentially matching identities of possible users, which limits the number of biometric-prints the system compares against an inbound biometric embedding for authentication. The types of extracted features originate from multiple modalities, including metadata from data communications, audio signals, and images. In this way, the embodiments apply a multi-modality machine-learning architecture.

DISPLAY APPARATUS, DISPLAY SYSTEM, AND DISPLAY METHOD
20220382964 · 2022-12-01 ·

A display apparatus includes circuitry. The circuitry receives an input of hand drafted data with an input device. The circuitry converts the hand drafted data into first text data. The circuitry receives an input of first voice data. The circuitry converts the first voice data into second text data. The circuitry displays, on a display, third text data converted from second voice data in a case that the first text data and the second text data match each other at least in part.

SYSTEM AND METHOD FOR EXTRACTING AND DISPLAYING SPEAKER INFORMATION IN AN ATC TRANSCRIPTION

A system for extracting speaker information in an ATC transcription and displaying the speaker information on a graphical display unit is provided. The system is configured to: segment a stream of audio received from an ATC and other aircraft into a plurality of chunks; determine, for each chunk, if the speaker is enrolled in an enrolled speaker database; when the speaker is enrolled in the enrolled speaker database, decode the chunk using a speaker-dependent automatic speech recognition (ASR) model and tag the chunk with a permanent name for the speaker; when the speaker is not enrolled in the enrolled speaker database, assign a temporary name for the speaker, tag the chunk with the temporary name, and decode the chunk using a speaker independent speech recognition model; format the decoded chunk as text; and signal the graphical display unit to display the formatted text along with an identity for the speaker.

Method for protecting biometric templates, and a system and method for verifying a speaker's identity

A method for protecting a biometric template, comprising the steps of: retrieving an original vector (V) representing said biometric template, said vector comprising a plurality of original elements (v.sub.1, v.sub.2, . . . v.sub.i, . . . , v.sub.n); mapping at least some elements from said original vector to a protected vector (P) comprising a plurality of protected elements (p.sub.1, p.sub.2, . . . p.sub.i, . . . , p.sub.n−m+1), the mapping being based on multivariate polynomials defined by m user-specific coefficients (C) and exponents (E).

SPEAKER AUTHENTICATION SYSTEM, METHOD, AND PROGRAM
20220375476 · 2022-11-24 · ·

Provided is a speaker authentication system capable of achieving robustness against adversarial examples. A data storage unit 112 stores data related to voice of a speaker. A plurality of voice processing units 11 respectively perform speaker authentication based on input voice and the data stored in the data storage unit 112. A post-processing unit 116 specifies one speaker authentication result based on speaker authentication results obtained respectively by the plurality of the voice processing units 11. A method or parameters of the pre-processing applied to the voice in each voice processing unit 11 are different for each voice processing unit 11.

Data control system for a data server and a plurality of cellular phones, a data server for the system, and a cellular phone for the system
20230053110 · 2023-02-16 · ·

A data control system comprises a user terminal such as a cellular phone, or an assist appliance, or a combination thereof, and a server in communication with the user terminal. The user terminal acquires the name of a person and an identification data of the person for storage as a reference on an opportunity of the first meeting with the person, and acquires the identification data of the person on an opportunity of meeting again to inform the name of the person with visual and/or audio display if the identification data is in consistency with the stored reference. The reference is transmitted to a server which allows another person to receive the reference on the condition that the same person has given a self-introduction both to a user of the user terminal and the another person to keep privacy of the same person against unknown persons.

BIOMETRIC AUTHENTICATION THROUGH VOICE PRINT CATEGORIZATION USING ARTIFICIAL INTELLIGENCE
20220358933 · 2022-11-10 ·

A system is provided to categorize voice prints during a voice authentication. The system includes a processor and a computer readable medium operably coupled thereto, to perform voice authentication operations which include receiving an enrollment of a user in the biometric authentication system, requesting a first voice print comprising a sample of a voice of the user, receiving the first voice print of the user during the enrollment, accessing a plurality of categorizations of the voice prints for the voice authentication, wherein each of the plurality of categorizations comprises a portion of the voice prints based on a plurality of similarity scores of distinct voice prints in the portion to a plurality of other voice prints, determining, using a hidden layer of a neural network, one of the plurality of categorizations for the first voice print, and encoding the first voice print with the one of the plurality of categorizations.

TARGET SPEAKER MODE
20230095526 · 2023-03-30 ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.

TARGET SPEAKER MODE
20230095526 · 2023-03-30 ·

Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.