Patent classifications
G10L17/06
Methods and apparatus for obtaining biometric data
A method of modelling speech of a user of a headset comprising a microphone, the method comprising: receiving a first sample, from a bone-conduction sensor, representing bone-conducted speech of the user; obtaining a measure of fundamental frequency of the bone-conducted speech in each of a plurality of speech frames of the first sample; obtaining a first distribution of the fundamental frequencies of the bone-conducted speech over the plurality of speech frames; receiving, from the microphone, a second sample; determining a first acoustic condition at the headset based on the second signal; performing a biometric process based on the first distribution of fundamental frequencies and the first acoustic condition.
Methods and apparatus for obtaining biometric data
A method of modelling speech of a user of a headset comprising a microphone, the method comprising: receiving a first sample, from a bone-conduction sensor, representing bone-conducted speech of the user; obtaining a measure of fundamental frequency of the bone-conducted speech in each of a plurality of speech frames of the first sample; obtaining a first distribution of the fundamental frequencies of the bone-conducted speech over the plurality of speech frames; receiving, from the microphone, a second sample; determining a first acoustic condition at the headset based on the second signal; performing a biometric process based on the first distribution of fundamental frequencies and the first acoustic condition.
Systems for authenticating digital contents
A system for authenticating digital contents includes a computing platform having a hardware processor and a memory storing a software code. According to one implementation, the hardware processor executes the software code to receive digital content, identify an image of a person depicted in the digital content, determine an ear shape parameter of the person depicted in the image, determine another biometric parameter of the person depicted in the image, and calculate a ratio of the ear shape parameter of the person depicted in the image to the biometric parameter of the person depicted in the image. The hardware processor is also configured to execute the software code to perform a comparison of the calculated ratio with a predetermined value, and determine whether the person depicted in the image is an authentic depiction of the person based on the comparison of the calculated ratio with the predetermined value.
Systems for authenticating digital contents
A system for authenticating digital contents includes a computing platform having a hardware processor and a memory storing a software code. According to one implementation, the hardware processor executes the software code to receive digital content, identify an image of a person depicted in the digital content, determine an ear shape parameter of the person depicted in the image, determine another biometric parameter of the person depicted in the image, and calculate a ratio of the ear shape parameter of the person depicted in the image to the biometric parameter of the person depicted in the image. The hardware processor is also configured to execute the software code to perform a comparison of the calculated ratio with a predetermined value, and determine whether the person depicted in the image is an authentic depiction of the person based on the comparison of the calculated ratio with the predetermined value.
Adaptive diarization model and user interface
A computing device receives a first audio waveform representing a first utterance and a second utterance. The computing device receives identity data indicating that the first utterance corresponds to a first speaker and the second utterance corresponds to a second speaker. The computing device determines, based on the first utterance, the second utterance, and the identity data, a diarization model configured to distinguish between utterances by the first speaker and utterances by the second speaker. The computing device receives, exclusively of receiving further identity data indicating a source speaker of a third utterance, a second audio waveform representing the third utterance. The computing device determines, by way of the diarization model and independently of the further identity data of the first type, the source speaker of the third utterance. The computing device updates the diarization model based on the third utterance and the determined source speaker.
Adaptive diarization model and user interface
A computing device receives a first audio waveform representing a first utterance and a second utterance. The computing device receives identity data indicating that the first utterance corresponds to a first speaker and the second utterance corresponds to a second speaker. The computing device determines, based on the first utterance, the second utterance, and the identity data, a diarization model configured to distinguish between utterances by the first speaker and utterances by the second speaker. The computing device receives, exclusively of receiving further identity data indicating a source speaker of a third utterance, a second audio waveform representing the third utterance. The computing device determines, by way of the diarization model and independently of the further identity data of the first type, the source speaker of the third utterance. The computing device updates the diarization model based on the third utterance and the determined source speaker.
SPEECH ENDPOINTING BASED ON WORD COMPARISONS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
SPEECH ENDPOINTING BASED ON WORD COMPARISONS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM
For example, the accuracy of voice recognition is improved.
A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM
For example, the accuracy of voice recognition is improved.
A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.