G10L17/02

SYSTEMS AND METHODS TO ANALYZE AUDIO DATA TO IDENTIFY DIFFERENT SPEAKERS
20230129467 · 2023-04-27 ·

A computing system may receive data representing dialog between persons, the data representing words spoken by at least first and second speakers, determine an intent of a speaker for a first portion of the data, the intent being indicative of an identity of the first or second speaker for the first portion of the data or another portion of the data different than the first portion, determine a name of the first or second speaker represented in the first portion of the data based at least in part on the determined intent, and output an indication of the determined name so that the indication identifies the first portion of the data or the another portion of the data with the first or second speaker.

Method and device of denoising voice signal
11475907 · 2022-10-18 · ·

The present disclosure provides a method and a device of denoising a voice signal. The method portion includes the following steps: filtering out an environmental noise signal in an original input signal according to an interference signal related to the environmental noise signal in the original input signal to obtain a first voice signal; obtaining a sample signal matching the first voice signal from a voice signal sample library; and filtering out other noise signal in the first voice signal according to the sample signal matching the first voice signal, to obtain an effective voice signal. The method provided by the present disclosure may effectively filter out the environmental noise signal and other noise signal in the voice signal.

Method and device of denoising voice signal
11475907 · 2022-10-18 · ·

The present disclosure provides a method and a device of denoising a voice signal. The method portion includes the following steps: filtering out an environmental noise signal in an original input signal according to an interference signal related to the environmental noise signal in the original input signal to obtain a first voice signal; obtaining a sample signal matching the first voice signal from a voice signal sample library; and filtering out other noise signal in the first voice signal according to the sample signal matching the first voice signal, to obtain an effective voice signal. The method provided by the present disclosure may effectively filter out the environmental noise signal and other noise signal in the voice signal.

System and Method for Generating Synthetic Cohorts Using Generative Modeling

A method, computer program product, and computing system for generating a generative model representative of a plurality of natural biometric profiles. A plurality of random samples are generated from the generative model. A plurality of synthetic biometric profiles are generated based upon, at least in part, the plurality of random samples.

Voice modulation based voice authentication

In some examples, voice modulation based voice authentication may include receiving a signal that represents a modulated voice of a user, and analyzing the signal to ascertain a specified code for a specified time period. Voice modulation based voice authentication may further include determining, for the specified time period, an authentication code from a plurality of authentication codes, and comparing the specified code to the authentication code. In response to a determination that the specified code matches the authentication code, voice modulation based voice authentication may further include authenticating the user.

Voice modulation based voice authentication

In some examples, voice modulation based voice authentication may include receiving a signal that represents a modulated voice of a user, and analyzing the signal to ascertain a specified code for a specified time period. Voice modulation based voice authentication may further include determining, for the specified time period, an authentication code from a plurality of authentication codes, and comparing the specified code to the authentication code. In response to a determination that the specified code matches the authentication code, voice modulation based voice authentication may further include authenticating the user.

Word-level blind diarization of recorded calls with arbitrary number of speakers

Disclosed herein are methods of diarizing audio data using first-pass blind diarization and second-pass blind diarization that generate speaker statistical models, wherein the first pass-blind diarization is on a per-frame basis and the second pass-blind diarization is on a per-word basis, and methods of creating acoustic signatures for a common speaker based only on the statistical models of the speakers in each audio session.

Speaker identification
11475899 · 2022-10-18 · ·

A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.

Speaker identification
11475899 · 2022-10-18 · ·

A method of speaker identification comprises receiving an audio signal representing speech; performing a first voice biometric process on the audio signal to attempt to identify whether the speech is the speech of an enrolled speaker; and, if the first voice biometric process makes an initial determination that the speech is the speech of an enrolled user, performing a second voice biometric process on the audio signal to attempt to identify whether the speech is the speech of the enrolled speaker. The second voice biometric process is selected to be more discriminative than the first voice biometric process.

Low-latency multi-speaker speech recognition

Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.