Patent classifications
G10L17/26
Detection of liveness
Detecting a replay attack on a voice biometrics system comprises: receiving a speech signal from a voice source; generating and transmitting an ultrasound signal through a transducer of the device; detecting a reflection of the transmitted ultrasound signal; detecting Doppler shifts in the reflection of the generated ultrasound signal; and identifying whether the received speech signal is indicative of liveness of a speaker based on the detected Doppler shifts. The method further comprises: obtaining information about a position of the device; and adapting the generating and transmitting of the ultrasound signal based on the information about the position of the device.
LEARNING APPARATUS, ESTIMATION APPARATUS, METHODS AND PROGRAMS FOR THE SAME
A learning apparatus includes: a speaker vector learning unit configured to learn a speaker vector extraction parameter λ based on one or more items of learning speech voice data in a speaker vector voice database; a non-speaker-individuality sound model learning unit configured to create a probability distribution model using a frequency component of one or more items of non-speaker-individuality sound data in a non-speaker-individuality sound database and calculate an internal parameter of the probability distribution model; and an age level estimation model learning unit configured to extract a speaker vector from voice data in an age level estimation model-learning voice database using the speaker vector extraction parameter λ, calculate a non-speaker-individuality sound likelihood vector from voice data in the age level estimation model-learning voice database using the internal parameters μ and Σ, and learn, with input of the speaker vector and the non-speaker-individuality sound likelihood vector, a parameter Ω of an age level estimation model that outputs an estimated value of an age level of a corresponding speaker.
SPEAKER IDENTIFICATION METHOD, SPEAKER IDENTIFICATION DEVICE, NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM STORING SPEAKER IDENTIFICATION PROGRAM, SEX IDENTIFICATION MODEL GENERATION METHOD, AND SPEAKER IDENTIFICATION MODEL GENERATION METHOD
A speaker identification device acquires identification target voice data; acquires registered voice data; selects a first speaker identification model machine-learned using male voice data to identify a male speaker in a case where one of a sex of a speaker of the identification target voice data and a sex of a speaker of the registered voice data is male, and selects a second speaker identification model machine-learned using female voice data to identify a female speaker in a case where one of a sex of the speaker of the identification target voice data and a sex of the speaker of the registered voice data is female; and inputs a feature amount of the identification target voice data and a feature amount of the registered voice data to one of the selected first speaker identification model and second speaker identification model to identify the speaker of the identification target voice data.
AUDIO ANALYSIS OF BODY WORN CAMERA
Machine natural language processing to analyze language in apparatus, systems, and methods of using are provided. Audio from camera footage can be transcribed in one exemplary method includes extracting at least one audio segment from a body camera video track, detecting voice activity to identify starting and ending timestamps of voice, transcribing the at least one audio segment to identify and separate audio of at least a first speaker, and scoring the audio of the first speaker to identify interactions of interest. Audio could be analyzed and scored to record verbal performance, respectfulness, wellness, etc. and speakers from the audio can be detected.
Analysis and matching of voice signals
Methods for detecting fraud include receiving a plurality of call interactions; extracting a voice print of a caller from each of the call interactions; determining which call interactions are associated with a single caller by comparing and matching pairs of voice prints of the call interactions; organizing the call interactions associated with a single caller into a group; and determining that a matching phrase was spoken by the single caller in a first call interaction and second call interaction in the group.
Analysis and matching of voice signals
Methods for detecting fraud include receiving a plurality of call interactions; extracting a voice print of a caller from each of the call interactions; determining which call interactions are associated with a single caller by comparing and matching pairs of voice prints of the call interactions; organizing the call interactions associated with a single caller into a group; and determining that a matching phrase was spoken by the single caller in a first call interaction and second call interaction in the group.
Systems and methods for determining traits based on voice analysis
Systems and methods are provided herein for determining one or more traits of a speaker based on voice analysis to present content item to the speaker. In one example, the method receives a voice query and determines whether the voice query matches within a first confidence threshold of a speaker identification (ID) among a plurality of speaker IDs stored in a speaker profile. In response to determining that the voice query matches to the speaker ID within the first confidence threshold, the method bypasses a trait prediction engine and retrieves a trait among the plurality of traits in the speaker profile associated with the matched speaker ID. The method further provides a content item based on the retrieved trait.
Systems and methods for determining traits based on voice analysis
Systems and methods are provided herein for determining one or more traits of a speaker based on voice analysis to present content item to the speaker. In one example, the method receives a voice query and determines whether the voice query matches within a first confidence threshold of a speaker identification (ID) among a plurality of speaker IDs stored in a speaker profile. In response to determining that the voice query matches to the speaker ID within the first confidence threshold, the method bypasses a trait prediction engine and retrieves a trait among the plurality of traits in the speaker profile associated with the matched speaker ID. The method further provides a content item based on the retrieved trait.
Voice vector framework for authenticating user interactions
There are provided systems and methods for a voice vector framework that authenticates user interactions. A service provider server receives user interaction data having audio data that is associated with an interaction between a user device and the service provider server. The server extracts user attributes from the audio data and obtains user account information associated with the user device. The server selects a classifier that corresponds to a select combination of features based on the user account information and applies the classifier to the user attributes. The server generates a voice vector that includes multiple scores indicating likelihoods that a respective user attribute corresponds to an attribute of the select combination of features. The server compares the voice vector to a baseline vector corresponding to a predetermined combination of features and sends a notification to an agent device with an indication of whether the user device is verified.
Voice vector framework for authenticating user interactions
There are provided systems and methods for a voice vector framework that authenticates user interactions. A service provider server receives user interaction data having audio data that is associated with an interaction between a user device and the service provider server. The server extracts user attributes from the audio data and obtains user account information associated with the user device. The server selects a classifier that corresponds to a select combination of features based on the user account information and applies the classifier to the user attributes. The server generates a voice vector that includes multiple scores indicating likelihoods that a respective user attribute corresponds to an attribute of the select combination of features. The server compares the voice vector to a baseline vector corresponding to a predetermined combination of features and sends a notification to an agent device with an indication of whether the user device is verified.