Patent classifications
G10L17/16
Detecting user identity in shared audio source contexts
Computerized systems are provided for determining an identity of one or more users that use a same audio source, such as a microphone. The identity of one or more users that use a same audio source can based on generating a list of participant candidates who are likely to participate in an associated event, such as a meeting. For instance, embodiments can generate one or more network graphs of a meeting invitee any only voice input samples of the meeting invitee's N closest connections are compared to an utterance to determine the identity of the user associated with the utterance. One or more indicators that identify the users who are using the same audio source, as well as additional information or metadata associated with the identified user can be caused to be presented.
Detecting user identity in shared audio source contexts
Computerized systems are provided for determining an identity of one or more users that use a same audio source, such as a microphone. The identity of one or more users that use a same audio source can based on generating a list of participant candidates who are likely to participate in an associated event, such as a meeting. For instance, embodiments can generate one or more network graphs of a meeting invitee any only voice input samples of the meeting invitee's N closest connections are compared to an utterance to determine the identity of the user associated with the utterance. One or more indicators that identify the users who are using the same audio source, as well as additional information or metadata associated with the identified user can be caused to be presented.
SPEAKER DIARIZATION WITH EARLY-STOP CLUSTERING
A method and apparatus for speaker diarization with early-stop clustering, segmenting an audio stream into at least one speech segment (710), the audio stream comprising speeches from at least one speaker, clustering the at least one speech segment into a plurality of clusters (720), the number of the plurality of clusters being greater than the number of the at least one speaker, selecting, from the plurality of clusters, at least one cluster of the highest similarity (730), the number of the selected at least one cluster being equal to the number of the at least one speaker, establishing a speaker classification model based on the selected at least one cluster (740); and aligning, through the speaker classification model, speech frames in the audio stream to the at least one speaker (750).
SPEAKER DIARIZATION WITH EARLY-STOP CLUSTERING
A method and apparatus for speaker diarization with early-stop clustering, segmenting an audio stream into at least one speech segment (710), the audio stream comprising speeches from at least one speaker, clustering the at least one speech segment into a plurality of clusters (720), the number of the plurality of clusters being greater than the number of the at least one speaker, selecting, from the plurality of clusters, at least one cluster of the highest similarity (730), the number of the selected at least one cluster being equal to the number of the at least one speaker, establishing a speaker classification model based on the selected at least one cluster (740); and aligning, through the speaker classification model, speech frames in the audio stream to the at least one speaker (750).
Speaker Identification Method and Apparatus in Multi-person Speech
The present disclosure relates to a speaker identification method and apparatus in a multi-person speech, and an electronic device and a storage medium, and relates to the technical field of computers. The method comprises: acquiring speech contents in a multi-person speech; extracting and processing a harmonics band in a voice segment of a pre-set length from the speech contents; making a calculation and analysis of the number of harmonics in the harmonics band and their relative strengths so as to determine the same speaker accordingly; identifying, by analyzing speech contents corresponding to different speakers, identity information about each of the speakers; and finally generating a corresponding relationship between the speech contents of the different speakers and the identity information about the speakers. The present disclosure can effectively distinguish identity information about speakers according to their speech contents.
Speaker Identification Method and Apparatus in Multi-person Speech
The present disclosure relates to a speaker identification method and apparatus in a multi-person speech, and an electronic device and a storage medium, and relates to the technical field of computers. The method comprises: acquiring speech contents in a multi-person speech; extracting and processing a harmonics band in a voice segment of a pre-set length from the speech contents; making a calculation and analysis of the number of harmonics in the harmonics band and their relative strengths so as to determine the same speaker accordingly; identifying, by analyzing speech contents corresponding to different speakers, identity information about each of the speakers; and finally generating a corresponding relationship between the speech contents of the different speakers and the identity information about the speakers. The present disclosure can effectively distinguish identity information about speakers according to their speech contents.
Automatic speaker identification using speech recognition features
Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
Automatic speaker identification using speech recognition features
Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (“ASR”) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (“GMMs”) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.
Method for speaker authentication and identification
A method and system for secure speaker authentication between a caller device and a first device using an authentication server are provided. The system comprises extracting features into a feature matrix from an incoming audio call; generating a partial i-vector, wherein the partial i-vector includes a first low-order statistic; sending the partial i-vector to the authentication server; and receiving from the authentication server a match score generated based on a full i-vector and another i-vector being stored on the authentication server, wherein the full i-vector is generated from the partial i-vector.
Method for speaker authentication and identification
A method and system for secure speaker authentication between a caller device and a first device using an authentication server are provided. The system comprises extracting features into a feature matrix from an incoming audio call; generating a partial i-vector, wherein the partial i-vector includes a first low-order statistic; sending the partial i-vector to the authentication server; and receiving from the authentication server a match score generated based on a full i-vector and another i-vector being stored on the authentication server, wherein the full i-vector is generated from the partial i-vector.