G10L17/04

Machine learning for improving quality of voice biometrics

Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.

Machine learning for improving quality of voice biometrics

Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.

Terminal and Operating Method Thereof
20230215418 · 2023-07-06 · ·

A terminal may include a display that is divided into at least two areas, when a real time broadcasting, where a user of the terminal is a host, starts through a broadcasting channel, and of which one area of the at least two areas is allocated to the host; an input/output interface that receives a voice of the host; a communication interface that receives one item selected of at least one or more items and a certain text from a terminal of a certain guest, of at least one or more guests who entered the broadcasting channel; and a processor that generates a voice message converted from the certain text into the voice of the host or a voice of the certain guest.

Terminal and Operating Method Thereof
20230215418 · 2023-07-06 · ·

A terminal may include a display that is divided into at least two areas, when a real time broadcasting, where a user of the terminal is a host, starts through a broadcasting channel, and of which one area of the at least two areas is allocated to the host; an input/output interface that receives a voice of the host; a communication interface that receives one item selected of at least one or more items and a certain text from a terminal of a certain guest, of at least one or more guests who entered the broadcasting channel; and a processor that generates a voice message converted from the certain text into the voice of the host or a voice of the certain guest.

Voice Biometric Authentication in a Virtual Assistant
20230216842 · 2023-07-06 ·

Aspects of the disclosure relate to voice biometric authentication in a virtual assistant. In some embodiments, a computing platform may receive, from a user device, an audio file comprising a voice command to access information related to a user account. The computing platform may retrieve one or more voice biometric signatures from a voice biometric database associated with the user account, and apply a voice biometric matching algorithm to compare the voice command of the audio file to the one or more voice biometric signatures to determine if a match exists between the voice command and one of the one or more voice biometric signatures. In response to determining that a match exists, the computing platform may retrieve information associated with the user account, and then send, via the communication interface, the information associated with the user account to the user device.

TRAINING AND USING A TRANSCRIPT GENERATION MODEL ON A MULTI-SPEAKER AUDIO STREAM

The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.

TRAINING AND USING A TRANSCRIPT GENERATION MODEL ON A MULTI-SPEAKER AUDIO STREAM

The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.

Voice input authentication device and method

Provided are a method of authenticating a voice input provided from a user and a method of detecting a voice input having a strong attack tendency. The voice input authentication method includes: receiving the voice input; obtaining, from the voice input, signal characteristic data representing signal characteristics of the voice input; and authenticating the voice input by applying the obtained signal characteristic data to a first learning model configured to determine an attribute of the voice input, wherein the first learning model is trained to determine the attribute of the voice input based on a voice uttered by a person and a voice output by an apparatus.

Voice input authentication device and method

Provided are a method of authenticating a voice input provided from a user and a method of detecting a voice input having a strong attack tendency. The voice input authentication method includes: receiving the voice input; obtaining, from the voice input, signal characteristic data representing signal characteristics of the voice input; and authenticating the voice input by applying the obtained signal characteristic data to a first learning model configured to determine an attribute of the voice input, wherein the first learning model is trained to determine the attribute of the voice input based on a voice uttered by a person and a voice output by an apparatus.

SYSTEM AND METHOD FOR AUGMENTED AUTHENTICATION USING ACOUSTIC DEVICES
20230216845 · 2023-07-06 · ·

Systems, methods, and computer program products are provided for augmented authentication using acoustic devices. The method includes receiving a transfer request including an NFT identifier from one of one or more acoustic devices. The NFT identifier corresponds to an acoustic device NFT associated with the given acoustic device and a device user. The method includes comparing the NFT identifier with one or more stored NFT identifiers to determine the given acoustic device associated with the NFT identifier. The method further includes confirming that the identity of the voice command user matches the device user associated with the acoustic device. The method still further includes causing an authentication of the transfer request upon confirming the acoustic device is associated with the voice command user.