Patent classifications
G10L17/10
AUDIO MATCHING METHOD AND RELATED DEVICE
Embodiments of the present application disclose an audio matching method and a related device. The audio matching method includes: obtaining audio data and video data; extracting to-be-recognized audio information from the audio data; extracting lip movement information of N users from the video data, where N is an integer greater than 1; inputting the to-be-recognized audio information and the lip movement information of the N users into a target feature matching model, to obtain a matching degree between each of the lip movement information of the N users and the to-be-recognized audio information; and determining a user corresponding to the lip movement information of the user with the highest matching degree as the target user to which the to-be-recognized audio information belongs.
PROTECTION AGAINST VOICE MISAPPROPRIATION IN A VOICE INTERACTION SYSTEM
Prevention of voice misappropriation in voice interaction/response systems. The system relies on telemetry data, including thermal data of components to determine whether a received voice command was made by actual voice. If the voice command is determined to have been made by an actual voice, a response to the command is generated and transmitted, otherwise if the voice command is determined to have likely not been made by an actual voice (e.g., artificial means replicating a voice, such as a laser or the like), no response to the command is transmitted or action taken with respect to the command.
AUTHENTICATION METHOD
An authentication method. The method comprises comparing user voice data received via an electronic device to a stored voice template to determine a voice authentication parameter. A voice authentication threshold is determined and the voice authentication parameter is compared to the voice authentication threshold to determine whether to authenticate the user. Determining the voice authentication threshold comprises determining a current value of an enrolment counter, then comparing the current value of the enrolment counter to an enrolment counter threshold and determining whether the stored voice template is fully enrolled according to the result. If the stored voice template is fully enrolled, the voice authentication threshold is set to a first voice authentication threshold. If the stored voice template is not fully enrolled then a device attribute received from the electronic device is compared to a stored device attribute. If the received device attribute matches the stored device attribute, the voice authentication threshold is set to a second voice authentication threshold determined by the current value of the enrolment counter. If the received device attribute does not match the stored device attribute, the voice authentication threshold is set to a third voice authentication threshold.
AUTHENTICATION METHOD
An authentication method. The method comprises comparing user voice data received via an electronic device to a stored voice template to determine a voice authentication parameter. A voice authentication threshold is determined and the voice authentication parameter is compared to the voice authentication threshold to determine whether to authenticate the user. Determining the voice authentication threshold comprises determining a current value of an enrolment counter, then comparing the current value of the enrolment counter to an enrolment counter threshold and determining whether the stored voice template is fully enrolled according to the result. If the stored voice template is fully enrolled, the voice authentication threshold is set to a first voice authentication threshold. If the stored voice template is not fully enrolled then a device attribute received from the electronic device is compared to a stored device attribute. If the received device attribute matches the stored device attribute, the voice authentication threshold is set to a second voice authentication threshold determined by the current value of the enrolment counter. If the received device attribute does not match the stored device attribute, the voice authentication threshold is set to a third voice authentication threshold.
ELECTRONIC DEVICE AND SPEAKER VERIFICATION METHOD OF ELECTRONIC DEVICE
An electronic device is provided. The electronic device includes a microphone configured to receive an audio signal including a voice of a user, a sensor configured to detect a vibration signal generated by the user, at least one processor, and a memory configured to store an instruction executable by the processor. The at least one processor may be configured to determine a noise level included in the audio signal, calculate a verification score based on the noise level, the audio signal, and the vibration signal, and perform speaker verification for the user based on the verification score.
Methods and systems for speech detection
Methods and systems for processing user input to a computing system are disclosed. The computing system has access to an audio input and a visual input such as a camera. Face detection is performed on an image from the visual input, and if a face is detected this triggers the recording of audio and making the audio available to a speech processing function. Further verification steps can be combined with the face detection step for a multi-factor verification of user intent to interact with the system.
In-ear liveness detection for voice user interfaces
Introduced here are approaches to authenticating the identity of speakers based on the “liveness” of the input. To prevent spoofing, an authentication platform may establish the likelihood that a voice sample represents a recording of word(s) uttered by a speaker whose identity is to be authenticated and then, based on the likelihood, determine whether to authenticate the speaker.
SYSTEM AND METHOD FOR SPEAKER VERIFICATION
A system for speaker verification is disclosed. An input receiving module receives an input audio-visual segment. An input processing module identifies one or more unlabelled speakers and one or more moments in time associated with each of the one or more unlabelled speakers in the audio-visual segment. An information extraction module extracts audio data representative of speech signal and visual data representative of facial images respectively. An input transformation module employs a first pre-trained neural network model to transform audio data of each unlabelled speaker into speaker speech space, employs a second pre-trained neural network model to transform visual data of each unlabelled speaker into speaker face space, and trains a third neural network model to match the audio data and the visual data of each unlabelled speaker with names of the labelled speakers obtained from prestored datasets. A speaker identification module identifies each unlabelled speaker with corresponding names, estimates confidence level corresponding to identification of the each unlabelled speaker from the audio-visual segment.
Methods and systems for processing audio signals containing speech data
Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A correspondence is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile indicates consent to store biometric data. If no correspondence is determined, the speech data is discarded, optionally after having been processed.
Methods and systems for processing audio signals containing speech data
Methods and systems for processing audio signals containing speech data are disclosed. Biometric data associated with at least one speaker are extracted from an audio input. A correspondence is determined between the extracted biometric data and stored biometric data associated with a consenting user profile, where a consenting user profile is a user profile indicates consent to store biometric data. If no correspondence is determined, the speech data is discarded, optionally after having been processed.