IPIQ

G10L17/12

Systems and methods for adapting human speaker embeddings in speech synthesis

11929058 · 2024-03-12 ·

Dolby Laboratories Licensing Corporation

Novel methods and systems for adapting a voice cloning synthesizer for a new speaker using real speech data are disclosed. Utterances from one or more target speakers are parameterized and are used to initialize an embedding vector for use with a voice synthesizer, by means of clustering the utterance data and determining the centroid of the data, using a speaker identification neural network, and/or by finding the closest stored embedded vector to the utterance data.

Assessing Speaker Recognition Performance

20240079013 · 2024-03-07 ·

Google Llc

A method for evaluating a verification model includes receiving a first and a second set of verification results where each verification result indicates whether a primary model or an alternative model verifies an identity of a user as a registered user. The method further includes identifying each verification result in the first and second sets that includes a performance metric. The method also includes determining a first score of the primary model based on a number of the verification results identified in the first set that includes the performance metric and determining a second score of the alternative model based on a number of the verification results identified in the second set that includes the performance metric. The method further includes determining whether a verification capability of the alternative model is better than a verification capability of the primary model based on the first score and the second score.

Assessing Speaker Recognition Performance

20240079013 · 2024-03-07 ·

Google Llc

Authorization of Action by Voice Identification

20240054195 · 2024-02-15 ·

Soundhound, Inc.

Actions are authorized by computing a confidence score that exceeds a threshold. The confidence score is based on a match between metadata about requests and fields in corresponding database records. The confidences score weights matches by the dependability of the metadata for authentication. The confidence score is further based on the closeness of a sample of speech audio to a stored voiceprint. Additional identification may be required for authorization. The confidence score requirement may be relaxed based on identification in a buffer of recent action requests.

Authorization of Action by Voice Identification

20240054195 · 2024-02-15 ·

Soundhound, Inc.

Automatic speaker identification using speech recognition features

11900948 · 2024-02-13 ·

Amazon Technologies, Inc.

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (ASR) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (GMMs) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.

Automatic speaker identification using speech recognition features

11900948 · 2024-02-13 ·

Amazon Technologies, Inc.

AUTOMATIC SPEAKER IDENTIFICATION USING SPEECH RECOGNITION FEATURES

20190378517 · 2019-12-12 ·

AUTOMATIC SPEAKER IDENTIFICATION USING SPEECH RECOGNITION FEATURES

20190378517 · 2019-12-12 ·

SPEAKER VERIFICATION

20190371340 · 2019-12-05 ·

Cirrus Logic International Semiconductor Ltd.

A method of speaker verification comprises: comparing a test input against a model of a user's speech obtained during a process of enrolling the user; obtaining a first score from comparing the test input against the model of the user's speech; comparing the test input against a first plurality of models of speech obtained from a first plurality of other speakers respectively; obtaining a plurality of cohort scores from comparing the test input against the plurality of models of speech obtained from a plurality of other speakers; obtaining statistics describing the plurality of cohort scores; modifying said statistics to obtain adjusted statistics; normalising the first score using the adjusted statistics to obtain a normalised score; and using the normalised score for speaker verification

Patent classifications

G10L17/12