G10L17/06

Systems and methods for determining traits based on voice analysis

Systems and methods are provided herein for determining one or more traits of a speaker based on voice analysis to present content item to the speaker. In one example, the method receives a voice query and determines whether the voice query matches within a first confidence threshold of a speaker identification (ID) among a plurality of speaker IDs stored in a speaker profile. In response to determining that the voice query matches to the speaker ID within the first confidence threshold, the method bypasses a trait prediction engine and retrieves a trait among the plurality of traits in the speaker profile associated with the matched speaker ID. The method further provides a content item based on the retrieved trait.

Systems and methods for determining traits based on voice analysis

Systems and methods are provided herein for determining one or more traits of a speaker based on voice analysis to present content item to the speaker. In one example, the method receives a voice query and determines whether the voice query matches within a first confidence threshold of a speaker identification (ID) among a plurality of speaker IDs stored in a speaker profile. In response to determining that the voice query matches to the speaker ID within the first confidence threshold, the method bypasses a trait prediction engine and retrieves a trait among the plurality of traits in the speaker profile associated with the matched speaker ID. The method further provides a content item based on the retrieved trait.

Machine learning for improving quality of voice biometrics

Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.

Machine learning for improving quality of voice biometrics

Methods and systems are disclosed herein for improving the quality of audio for use in a biometric. A biometric system may use machine learning to determine whether audio or a portion of the audio should be used as a biometric for a user. A sample of the user's voice may be used to generate a voice signature of the user. Portions of the audio that do not meet a similarity threshold when compared with the voice signature may be removed from the audio. Additionally or alternatively, interfering noises may be detected and removed from the audio to improve the quality of a voice biometric generated from the audio.

SYSTEM AND METHOD FOR SPEAKER VERIFICATION
20230215440 · 2023-07-06 ·

A system for speaker verification is disclosed. An input receiving module receives an input audio-visual segment. An input processing module identifies one or more unlabelled speakers and one or more moments in time associated with each of the one or more unlabelled speakers in the audio-visual segment. An information extraction module extracts audio data representative of speech signal and visual data representative of facial images respectively. An input transformation module employs a first pre-trained neural network model to transform audio data of each unlabelled speaker into speaker speech space, employs a second pre-trained neural network model to transform visual data of each unlabelled speaker into speaker face space, and trains a third neural network model to match the audio data and the visual data of each unlabelled speaker with names of the labelled speakers obtained from prestored datasets. A speaker identification module identifies each unlabelled speaker with corresponding names, estimates confidence level corresponding to identification of the each unlabelled speaker from the audio-visual segment.

SYSTEM AND METHOD FOR SPEAKER VERIFICATION
20230215440 · 2023-07-06 ·

A system for speaker verification is disclosed. An input receiving module receives an input audio-visual segment. An input processing module identifies one or more unlabelled speakers and one or more moments in time associated with each of the one or more unlabelled speakers in the audio-visual segment. An information extraction module extracts audio data representative of speech signal and visual data representative of facial images respectively. An input transformation module employs a first pre-trained neural network model to transform audio data of each unlabelled speaker into speaker speech space, employs a second pre-trained neural network model to transform visual data of each unlabelled speaker into speaker face space, and trains a third neural network model to match the audio data and the visual data of each unlabelled speaker with names of the labelled speakers obtained from prestored datasets. A speaker identification module identifies each unlabelled speaker with corresponding names, estimates confidence level corresponding to identification of the each unlabelled speaker from the audio-visual segment.

SYSTEM AND METHOD OF SPEAKER REIDENTIFICATION IN A MULTIPLE CAMERA SETTING CONFERENCE ROOM
20230216988 · 2023-07-06 ·

In a multi-camera videoconferencing configuration, the locations of each camera are known. By referencing a known object visible to each camera, a 3D coordinate system is developed, with the position and angle of each camera being associated with that 3D coordinate system. The locations of the conference participants in the 3D coordinate system are determined for each camera. Sound source localization (SSL) from one camera, generally a central camera, is used to determine the speaker. The pose of the speaker is then determined. From the pose and the known locations of the cameras, the camera with the best frontal view of the speaker is determined. The 3D coordinates of the speaker are then used to direct the determined camera to frame the speaker. If the face of the speaker is not sufficiently visible, the next best camera view is determined, and the speaker framed from that camera view.

SYSTEM AND METHOD OF SPEAKER REIDENTIFICATION IN A MULTIPLE CAMERA SETTING CONFERENCE ROOM
20230216988 · 2023-07-06 ·

In a multi-camera videoconferencing configuration, the locations of each camera are known. By referencing a known object visible to each camera, a 3D coordinate system is developed, with the position and angle of each camera being associated with that 3D coordinate system. The locations of the conference participants in the 3D coordinate system are determined for each camera. Sound source localization (SSL) from one camera, generally a central camera, is used to determine the speaker. The pose of the speaker is then determined. From the pose and the known locations of the cameras, the camera with the best frontal view of the speaker is determined. The 3D coordinates of the speaker are then used to direct the determined camera to frame the speaker. If the face of the speaker is not sufficiently visible, the next best camera view is determined, and the speaker framed from that camera view.

Voice input authentication device and method

Provided are a method of authenticating a voice input provided from a user and a method of detecting a voice input having a strong attack tendency. The voice input authentication method includes: receiving the voice input; obtaining, from the voice input, signal characteristic data representing signal characteristics of the voice input; and authenticating the voice input by applying the obtained signal characteristic data to a first learning model configured to determine an attribute of the voice input, wherein the first learning model is trained to determine the attribute of the voice input based on a voice uttered by a person and a voice output by an apparatus.

Voice input authentication device and method

Provided are a method of authenticating a voice input provided from a user and a method of detecting a voice input having a strong attack tendency. The voice input authentication method includes: receiving the voice input; obtaining, from the voice input, signal characteristic data representing signal characteristics of the voice input; and authenticating the voice input by applying the obtained signal characteristic data to a first learning model configured to determine an attribute of the voice input, wherein the first learning model is trained to determine the attribute of the voice input based on a voice uttered by a person and a voice output by an apparatus.