Patent classifications
G10L17/20
INFORMATION TRANSMISSION DEVICE, INFORMATION RECEPTION DEVICE, INFORMATION TRANSMISSION METHOD, RECORDING MEDIUM, AND SYSTEM
An information transmission device according to the present disclosure includes: an acoustic feature calculator that calculates an acoustic feature of a spoken voice; a speaker feature calculator that calculates a speaker feature from the acoustic feature using a deep neural network (DNN), the speaker feature being a feature unique to a speaker of the spoken voice; an analyzer that analyzes condition information indicating a condition to be used in calculating the speaker feature, based on the spoken voice; and an information transmitter that transmits the speaker feature and the condition information to an information reception device that performs speaker recognition processing on the spoken voice, as information to be used by the information reception device to recognize the speaker of the spoken voice.
Methods and systems for generating domain-specific text summarizations
Embodiments provide methods and systems for generating domain-specific text summary. Method performed by processor includes receiving request to generate text summary of textual content from user device of user and applying pre-trained language generation model over textual content for encoding textual content into word embedding vectors. Method includes predicting current word of the text summary, by iteratively performing: generating first probability distribution of first set of words using first decoder based on word embedding vectors, generating second probability distribution of second set of words using second decoder based on word embedding vectors, and ensembling first and second probability distributions using configurable weight parameter for determining current word. First probability distribution indicates selection probability of each word being selected as current word. Method includes providing custom reward score as feedback to second decoder based on custom reward model and modifying second probability distribution of words for text summary based on feedback.
Methods and systems for generating domain-specific text summarizations
Embodiments provide methods and systems for generating domain-specific text summary. Method performed by processor includes receiving request to generate text summary of textual content from user device of user and applying pre-trained language generation model over textual content for encoding textual content into word embedding vectors. Method includes predicting current word of the text summary, by iteratively performing: generating first probability distribution of first set of words using first decoder based on word embedding vectors, generating second probability distribution of second set of words using second decoder based on word embedding vectors, and ensembling first and second probability distributions using configurable weight parameter for determining current word. First probability distribution indicates selection probability of each word being selected as current word. Method includes providing custom reward score as feedback to second decoder based on custom reward model and modifying second probability distribution of words for text summary based on feedback.
SPEAKER VERIFICATION USING CO-LOCATION INFORMATION
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
SPEAKER VERIFICATION USING CO-LOCATION INFORMATION
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.
Audio data processing method, apparatus and storage medium for detecting wake-up words based on multi-path audio from microphone array
An audio data processing method is provided. The method includes: obtaining multi-path audio data in an environmental space, obtaining a speech data set based on the multi-path audio data, and separately generating, in a plurality of enhancement directions, enhanced speech information corresponding to the speech data set; matching a speech hidden feature in the enhanced speech information with a target matching word, and determining an enhancement direction corresponding to the enhanced speech information having a highest degree of matching with the target matching word as a target audio direction; obtaining speech spectrum features in the enhanced speech information, and obtaining, from the speech spectrum features, a speech spectrum feature in the target audio direction; and performing speech authentication on the speech hidden feature and the speech spectrum feature that are in the target audio direction based on the target matching word, to obtain a target authentication result.
Audio data processing method, apparatus and storage medium for detecting wake-up words based on multi-path audio from microphone array
An audio data processing method is provided. The method includes: obtaining multi-path audio data in an environmental space, obtaining a speech data set based on the multi-path audio data, and separately generating, in a plurality of enhancement directions, enhanced speech information corresponding to the speech data set; matching a speech hidden feature in the enhanced speech information with a target matching word, and determining an enhancement direction corresponding to the enhanced speech information having a highest degree of matching with the target matching word as a target audio direction; obtaining speech spectrum features in the enhanced speech information, and obtaining, from the speech spectrum features, a speech spectrum feature in the target audio direction; and performing speech authentication on the speech hidden feature and the speech spectrum feature that are in the target audio direction based on the target matching word, to obtain a target authentication result.
Methods and apparatus for obtaining biometric data
A method of modelling speech of a user of a headset comprising a microphone, the method comprising: receiving a first sample, from a bone-conduction sensor, representing bone-conducted speech of the user; obtaining a measure of fundamental frequency of the bone-conducted speech in each of a plurality of speech frames of the first sample; obtaining a first distribution of the fundamental frequencies of the bone-conducted speech over the plurality of speech frames; receiving, from the microphone, a second sample; determining a first acoustic condition at the headset based on the second signal; performing a biometric process based on the first distribution of fundamental frequencies and the first acoustic condition.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM
For example, the accuracy of voice recognition is improved.
A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.
SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND PROGRAM
For example, the accuracy of voice recognition is improved.
A signal processing device includes: a single speech detection unit that detects whether one channel of an input voice signal is a speech of a single speaker; a cluster information updating unit that updates cluster information based on a voice feature quantity when the input voice signal is a speech of a single speaker; a voice segment detection unit that detects a speech segment of a target speaker based on the cluster information; and a voice extraction unit that extracts only the voice signal of the target speaker from a mixed voice signal containing the voice of the target speaker.