IPIQ

G10L17/04

GLOBAL PROSODY STYLE TRANSFER WITHOUT TEXT TRANSCRIPTIONS

20220392429 · 2022-12-08 ·

A computer-implemented method is provided of using a machine learning model for disentanglement of prosody in spoken natural language. The method includes encoding, by a computing device, the spoken natural language to produce content code. The method further includes resampling, by the computing device without text transcriptions, the content code to obscure the prosody by applying an unsupervised technique to the machine learning model to generate prosody-obscured content code. The method additionally includes decoding, by the computing device, the prosody-obscured content code to synthesize speech indirectly based upon the content code.

GLOBAL PROSODY STYLE TRANSFER WITHOUT TEXT TRANSCRIPTIONS

20220392429 · 2022-12-08 ·

LIMITING IDENTITY SPACE FOR VOICE BIOMETRIC AUTHENTICATION

20220392452 · 2022-12-08 ·

Pindrop Security, Inc.

Disclosed are systems and methods including computing-processes executing machine-learning architectures extract vectors representing disparate types of data and output predicted identities of users accessing computing services, without express identity assertions, and across multiple computing services, analyzing data from multiple modalities, for various user devices, and agnostic to architectures hosting the disparate computing service. The system invokes the identification operations of the machine-learning architecture, which extracts biometric embeddings from biometric data and context embeddings representing all or most of the types of metadata features analyzed by the system. The context embeddings help identify a subset of potentially matching identities of possible users, which limits the number of biometric-prints the system compares against an inbound biometric embedding for authentication. The types of extracted features originate from multiple modalities, including metadata from data communications, audio signals, and images. In this way, the embodiments apply a multi-modality machine-learning architecture.

LIMITING IDENTITY SPACE FOR VOICE BIOMETRIC AUTHENTICATION

20220392452 · 2022-12-08 ·

Pindrop Security, Inc.

LIMITING IDENTITY SPACE FOR VOICE BIOMETRIC AUTHENTICATION

20220392453 · 2022-12-08 ·

Pindrop Security, Inc.

LIMITING IDENTITY SPACE FOR VOICE BIOMETRIC AUTHENTICATION

20220392453 · 2022-12-08 ·

Pindrop Security, Inc.

Voice command system and voice command method

11521609 · 2022-12-06 ·

Kyocera Corporation

Yumiko Yamamoto

A voice command system according to a first disclosure comprises a gateway apparatus having an interface configured to receive a voice command, and a controller configured to perform a registration process of registering a speaker permitted to receive the voice command. The controller is configured to perform an authentication process of rejecting a reception of the voice command when a speaker of the voice command is not registered, and permitting a reception of the voice command when a speaker of the voice command is registered. The controller is configured to perform the authentication process for each voice command.

Voice command system and voice command method

11521609 · 2022-12-06 ·

Kyocera Corporation

Yumiko Yamamoto

System and method for efficient processing of universal background models for speaker recognition

11521622 · 2022-12-06 ·

Illuma Labs Inc.

Milind Borkar

A system and method for efficient universal background model (UBM) training for speaker recognition, including: receiving an audio input, divisible into a plurality of audio frames, wherein at least a first audio frame of the plurality of audio frames includes an audio sample having a length above a first threshold extracting at least one identifying feature from the first audio frame and generating a feature vector based on the at least one identifying feature; generating an optimized training sequence computation based on the feature vector and a Gaussian Mixture Model (GMM), wherein the GMM is associated with a plurality of components, wherein each of the plurality of components is defined by a covariance matrix, a mean vector, and a weight vector; and updating any of the associated components of the GMM based on the generated optimized training sequence computation.

System and method for efficient processing of universal background models for speaker recognition

11521622 · 2022-12-06 ·

Illuma Labs Inc.

Milind Borkar

Patent classifications

G10L17/04