Patent classifications
G10L17/02
Transcription System with Contextual Automatic Speech Recognition
An automated speech recognition (“ASR”) system with an audio processing engine and contextual transcription engine on a computing device is provided. The audio processing engine determines audio segmentation corresponding with multiple identified speakers of audio data. The contextual transcription engine generates a text file based on the audio data in a legally-formatted transcript using one or more AI/ML models. Embodiments of the ASR system provide provides results that will comply with most of the stenographic standards for legal transcription out of the box without further setup or tuning.
Transcription System with Contextual Automatic Speech Recognition
An automated speech recognition (“ASR”) system with an audio processing engine and contextual transcription engine on a computing device is provided. The audio processing engine determines audio segmentation corresponding with multiple identified speakers of audio data. The contextual transcription engine generates a text file based on the audio data in a legally-formatted transcript using one or more AI/ML models. Embodiments of the ASR system provide provides results that will comply with most of the stenographic standards for legal transcription out of the box without further setup or tuning.
Analysis and matching of voice signals
Methods for detecting fraud include receiving a plurality of call interactions; extracting a voice print of a caller from each of the call interactions; determining which call interactions are associated with a single caller by comparing and matching pairs of voice prints of the call interactions; organizing the call interactions associated with a single caller into a group; and determining that a matching phrase was spoken by the single caller in a first call interaction and second call interaction in the group.
Analysis and matching of voice signals
Methods for detecting fraud include receiving a plurality of call interactions; extracting a voice print of a caller from each of the call interactions; determining which call interactions are associated with a single caller by comparing and matching pairs of voice prints of the call interactions; organizing the call interactions associated with a single caller into a group; and determining that a matching phrase was spoken by the single caller in a first call interaction and second call interaction in the group.
Method for reduced computation of T-matrix training for speaker recognition
A system and method for improving T-matrix training for speaker recognition, comprising receiving an audio input, divisible into a plurality of audio frames including at least an audio sample of a human speaker; generating for each audio frame a feature vector; generating for a first plurality of feature vectors centered statistics of at least a zero order and a first order; generating a first i-vector, the first i-vector representing the human speaker; and generating an optimized T-matrix training sequence computation, based on at least the first i-vector.
Authentication method, authentication device, electronic device and storage medium
The present disclosure provides an authentication method, an authentication device, an electronic device and a storage medium. The authentication method includes: receiving target voice data; obtaining a first voiceprint feature parameter corresponding to the target voice data from a device voiceprint model library; performing a first encryption process on the first voiceprint feature parameter with a locally stored private key to generate to-be-verified data; transmitting the to-be-verified data to a server, so that the server uses a public key which matches the private key to decrypt the to-be-verified data to obtain the first voiceprint feature parameter, and performs authentication on the first voiceprint feature parameter to obtain an authentication result; receiving the authentication result returned by the server.
Authentication method, authentication device, electronic device and storage medium
The present disclosure provides an authentication method, an authentication device, an electronic device and a storage medium. The authentication method includes: receiving target voice data; obtaining a first voiceprint feature parameter corresponding to the target voice data from a device voiceprint model library; performing a first encryption process on the first voiceprint feature parameter with a locally stored private key to generate to-be-verified data; transmitting the to-be-verified data to a server, so that the server uses a public key which matches the private key to decrypt the to-be-verified data to obtain the first voiceprint feature parameter, and performs authentication on the first voiceprint feature parameter to obtain an authentication result; receiving the authentication result returned by the server.
Shared speech processing network for multiple speech applications
A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.
System and method for quantifying meeting effectiveness using natural language processing
Systems, methods, and computer-readable storage media for quantifying meeting effectiveness for an individual. A system configured as disclosed herein uses data from multiple meetings in which a user participated to create a user profile for the user. The system then receives data related to a new meeting in which the user participated, processes the new meeting data into segments using natural language processing, tags the resulting segments based on contexts, and compares the tagged segments to the user profile to generate a meeting effectiveness score for the new meeting which is specific to the user. The system can use machine learning to iteratively improve an ability of the system to generate the tagged segments using historical meeting data and updating that historical meeting data with each iteration of scoring a meeting's effectiveness.
Machine learning for improving quality of voice biometrics
Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.