G10L17/04

Z-vectors: speaker embeddings from raw audio using sincnet, extended CNN architecture and in-network augmentation techniques

Described herein are systems and methods for improved audio analysis using a computer-executed neural network having one or more in-network data augmentation layers. The systems described herein help ease or avoid unwanted strain on computing resources by employing the data augmentation techniques within the layers of the neural network. The in-network data augmentation layers will produce various types of simulated audio data when the computer applies the neural network on an inputted audio signal during a training phase, enrollment phase, and/or testing phase. Subsequent layers of the neural network (e.g., convolutional layer, pooling layer, data augmentation layer) ingest the simulated audio data and the inputted audio signal and perform various operations.

SYSTEMS AND METHODS FOR COHERENT AND TIERED VOICE ENROLLMENT

Computer-implemented methods and systems include enrolling a user at a first security tier, from a plurality of security tiers, based on user risk criteria and call risk criteria applied to one or more historical calls, storing voice calibration information for the enrolled user based on the one or more historical calls, monitoring for a call and receiving data associated with the call, the data having a voice component captured using a microphone, authenticating the call as originating from the enrolled user by matching the voice component to the voice calibration information, granting the enrolled user account access in accordance with the first security tier, during the call, based on the enrolling the user at the first security tier and the authenticating the call as originating from the enrolled user, and updating the voice calibration information based on the voice component.

SYSTEMS AND METHODS FOR COHERENT AND TIERED VOICE ENROLLMENT

Computer-implemented methods and systems include enrolling a user at a first security tier, from a plurality of security tiers, based on user risk criteria and call risk criteria applied to one or more historical calls, storing voice calibration information for the enrolled user based on the one or more historical calls, monitoring for a call and receiving data associated with the call, the data having a voice component captured using a microphone, authenticating the call as originating from the enrolled user by matching the voice component to the voice calibration information, granting the enrolled user account access in accordance with the first security tier, during the call, based on the enrolling the user at the first security tier and the authenticating the call as originating from the enrolled user, and updating the voice calibration information based on the voice component.

Digital Monitoring Badge System
20230228832 · 2023-07-20 ·

A wearable badge for an employee that records and transmits audio from client interactions with the professional, comprising two microphones and two microphone channels that focus one microphone on the speech of the employee and the other microphone on the speech of the customer, making diarizing easier. The wearable badge also comprises a module to determine whether or not the employee is maintaining an appropriate social distance with customers.

Digital Monitoring Badge System
20230228832 · 2023-07-20 ·

A wearable badge for an employee that records and transmits audio from client interactions with the professional, comprising two microphones and two microphone channels that focus one microphone on the speech of the employee and the other microphone on the speech of the customer, making diarizing easier. The wearable badge also comprises a module to determine whether or not the employee is maintaining an appropriate social distance with customers.

DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS

A method, computer program product, and computing system for obtaining one or more speech signals from a first device, thus defining one or more first device speech signals. One or more speech signals may be obtained from a second device, thus defining one or more second device speech signals. One or more acoustic relative transfer functions mapping reverberation from the one or more first device speech signals to the one or more second device speech signals may be generated. One or more augmented second device speech signals may be generated based upon, at least in part, the one or more acoustic relative transfer functions and first device training data.

DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS

A method, computer program product, and computing system for obtaining one or more speech signals from a first device, thus defining one or more first device speech signals. One or more speech signals may be obtained from a second device, thus defining one or more second device speech signals. One or more acoustic relative transfer functions mapping reverberation from the one or more first device speech signals to the one or more second device speech signals may be generated. One or more augmented second device speech signals may be generated based upon, at least in part, the one or more acoustic relative transfer functions and first device training data.

Voice and speech recognition for call center feedback and quality assurance

A computer-implemented method for providing an objective evaluation to a customer service representative regarding his performance during an interaction with a customer may include receiving a digitized data stream corresponding to a spoken conversation between a customer and a representative; converting the data stream to a text stream; generating a representative transcript that includes the words from the text stream that are spoken by the representative; comparing the representative transcript with a plurality of positive words and a plurality of negative words; and generating a score that varies according to the occurrence of each word spoken by the representative that matches one of the positive words, and/or the occurrence of each word spoken by the representative that matches one of the negative words. Tone of voice, as well as response time, during the interaction may also be monitored and analyzed to adjust the score, or generate a separate score.

Voice and speech recognition for call center feedback and quality assurance

A computer-implemented method for providing an objective evaluation to a customer service representative regarding his performance during an interaction with a customer may include receiving a digitized data stream corresponding to a spoken conversation between a customer and a representative; converting the data stream to a text stream; generating a representative transcript that includes the words from the text stream that are spoken by the representative; comparing the representative transcript with a plurality of positive words and a plurality of negative words; and generating a score that varies according to the occurrence of each word spoken by the representative that matches one of the positive words, and/or the occurrence of each word spoken by the representative that matches one of the negative words. Tone of voice, as well as response time, during the interaction may also be monitored and analyzed to adjust the score, or generate a separate score.

Enrollment with an automated assistant

Techniques are described herein for dialog-based enrollment of individual users for single- and/or multi-modal recognition by an automated assistant, as well as determining how to respond to a particular user's request based on the particular user being enrolled and/or recognized. Rather than requiring operation of a graphical user interface for individual enrollment, dialog-based enrollment enables users to enroll themselves (or others) by way of a human-to-computer dialog with the automated assistant.