Patent classifications
G10L17/14
Method and apparatus for recognizing speaker by using a resonator
Provided are a method and device for recognizing a speaker by using a resonator. The method of recognizing the speaker includes receiving a plurality of electrical signals corresponding to a speech of the speaker from a plurality of resonators having different resonance bands; obtaining a difference of magnitudes of the plurality of electrical signals; and recognizing the speaker based on the difference of magnitudes of the plurality of electrical signals.
CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION
Embodiments may include determination, for each of a plurality of speech frames associated with an acoustic feature, of a phonetic feature based on the associated acoustic feature, generation of one or more two-dimensional feature maps based on the plurality of phonetic features, input of the one or more two-dimensional feature maps to a trained neural network to generate a plurality of speaker embeddings, and aggregation of the plurality of speaker embeddings into a speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, wherein the speaker embedding is associated with an identity of the speaker.
SYSTEMS AND METHODS FOR AUTOMATIC JOINING AS A VIRTUAL MEETING PARTICIPANT FOR TRANSCRIPTION
Method, system, device, and non-transitory computer-readable medium for joining a virtual participant in a conversation. In some examples, a computer-implemented method includes: identifying a first conversation scheduled to be participated by a first group of actual participants; joining a first virtual participant into the first conversation; obtaining, via the first virtual participant, a first set of audio data associated with the first conversation while the first conversation occurs; transcribing, via the first virtual participant, the first set of audio data into a first set of text data while the first conversation occurs; and presenting the first set of text data to the first group of actual participants while the first conversation occurs.
SYSTEMS AND METHODS FOR AUTOMATIC JOINING AS A VIRTUAL MEETING PARTICIPANT FOR TRANSCRIPTION
Method, system, device, and non-transitory computer-readable medium for joining a virtual participant in a conversation. In some examples, a computer-implemented method includes: identifying a first conversation scheduled to be participated by a first group of actual participants; joining a first virtual participant into the first conversation; obtaining, via the first virtual participant, a first set of audio data associated with the first conversation while the first conversation occurs; transcribing, via the first virtual participant, the first set of audio data into a first set of text data while the first conversation occurs; and presenting the first set of text data to the first group of actual participants while the first conversation occurs.
SPEAKER VERIFICATION USING CO-LOCATION INFORMATION
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
SPEAKER VERIFICATION USING CO-LOCATION INFORMATION
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
Autocorrection of pronunciations of keywords in audio/videoconferences
The present disclosure relates to automatically correcting mispronounced keywords during a conference session. More particularly, the present invention provides methods and systems for automatically correcting audio data generated from audio input having indications of mispronounced keywords during an audio/videoconferencing system. In some embodiments, the process of automatically correcting the audio data may require a re-encoding process of the audio data at the conference server. In alternative embodiments, the process may require updating the audio data at the receiver end of the conferencing system.
Autocorrection of pronunciations of keywords in audio/videoconferences
The present disclosure relates to automatically correcting mispronounced keywords during a conference session. More particularly, the present invention provides methods and systems for automatically correcting audio data generated from audio input having indications of mispronounced keywords during an audio/videoconferencing system. In some embodiments, the process of automatically correcting the audio data may require a re-encoding process of the audio data at the conference server. In alternative embodiments, the process may require updating the audio data at the receiver end of the conferencing system.
Method and system for integrating voice biometrics
Systems and methods for determining whether a voice biometrics credential provides a reliable mechanism for authenticating a user are provided. The method includes receiving at least one set of voice data from the user; determining, based on the received at least one set of voice data, a value of at least one parameter that corresponds to a user-specific voice biometrics credential; obtaining at least one user-specific item of information; accessing at least one business rule that relates to the user; and determining, based on the at least one set of voice data, the at least one user-specific item of information, and the at least one business rule, whether the user-specific voice biometrics credential is usable for authenticating the user.
Method and system for integrating voice biometrics
Systems and methods for determining whether a voice biometrics credential provides a reliable mechanism for authenticating a user are provided. The method includes receiving at least one set of voice data from the user; determining, based on the received at least one set of voice data, a value of at least one parameter that corresponds to a user-specific voice biometrics credential; obtaining at least one user-specific item of information; accessing at least one business rule that relates to the user; and determining, based on the at least one set of voice data, the at least one user-specific item of information, and the at least one business rule, whether the user-specific voice biometrics credential is usable for authenticating the user.