G10L17/18

SPEECH ENHANCEMENT APPARATUS, LEARNING APPARATUS, METHOD AND PROGRAM THEREOF

A mask to enhance speech emitted from a speaker is estimated from an observation signal, the mask is applied to the observation signal, and thereby a post-mask speech signal is acquired. The mask is estimated from a feature obtained by combining a feature for speaker recognition extracted from the observation signal and a feature for generalized mask estimation extracted from the observation signal.

INFORMATION TRANSMISSION DEVICE, INFORMATION RECEPTION DEVICE, INFORMATION TRANSMISSION METHOD, RECORDING MEDIUM, AND SYSTEM
20230050621 · 2023-02-16 ·

An information transmission device according to the present disclosure includes: an acoustic feature calculator that calculates an acoustic feature of a spoken voice; a speaker feature calculator that calculates a speaker feature from the acoustic feature using a deep neural network (DNN), the speaker feature being a feature unique to a speaker of the spoken voice; an analyzer that analyzes condition information indicating a condition to be used in calculating the speaker feature, based on the spoken voice; and an information transmitter that transmits the speaker feature and the condition information to an information reception device that performs speaker recognition processing on the spoken voice, as information to be used by the information reception device to recognize the speaker of the spoken voice.

INFORMATION TRANSMISSION DEVICE, INFORMATION RECEPTION DEVICE, INFORMATION TRANSMISSION METHOD, RECORDING MEDIUM, AND SYSTEM
20230050621 · 2023-02-16 ·

An information transmission device according to the present disclosure includes: an acoustic feature calculator that calculates an acoustic feature of a spoken voice; a speaker feature calculator that calculates a speaker feature from the acoustic feature using a deep neural network (DNN), the speaker feature being a feature unique to a speaker of the spoken voice; an analyzer that analyzes condition information indicating a condition to be used in calculating the speaker feature, based on the spoken voice; and an information transmitter that transmits the speaker feature and the condition information to an information reception device that performs speaker recognition processing on the spoken voice, as information to be used by the information reception device to recognize the speaker of the spoken voice.

Training method of a speaker identification model based on a first language and a second language

A training method of training a speaker identification model which receives voice data as an input and outputs speaker identification information for identifying a speaker of an utterance included in the voice data is provided. The training method includes: performing voice quality conversion of first voice data of a first speaker to generate second voice data of a second speaker; and performing training of the speaker identification model using, as training data, the first voice data and the second voice data.

Training method of a speaker identification model based on a first language and a second language

A training method of training a speaker identification model which receives voice data as an input and outputs speaker identification information for identifying a speaker of an utterance included in the voice data is provided. The training method includes: performing voice quality conversion of first voice data of a first speaker to generate second voice data of a second speaker; and performing training of the speaker identification model using, as training data, the first voice data and the second voice data.

AUDIO MATCHING METHOD AND RELATED DEVICE

Embodiments of the present application disclose an audio matching method and a related device. The audio matching method includes: obtaining audio data and video data; extracting to-be-recognized audio information from the audio data; extracting lip movement information of N users from the video data, where N is an integer greater than 1; inputting the to-be-recognized audio information and the lip movement information of the N users into a target feature matching model, to obtain a matching degree between each of the lip movement information of the N users and the to-be-recognized audio information; and determining a user corresponding to the lip movement information of the user with the highest matching degree as the target user to which the to-be-recognized audio information belongs.

Method and device with data recognition

A processor-implemented method with data recognition includes: extracting input feature data from input data; calculating a matching score between the extracted input feature data and enrolled feature data of an enrolled user, based on the extracted input feature data, common component data of a plurality of enrolled feature data corresponding to the enrolled user, and distribution component data of the plurality of enrolled feature data corresponding to the enrolled user; and recognizing the input data based on the matching score.

Method and device with data recognition

A processor-implemented method with data recognition includes: extracting input feature data from input data; calculating a matching score between the extracted input feature data and enrolled feature data of an enrolled user, based on the extracted input feature data, common component data of a plurality of enrolled feature data corresponding to the enrolled user, and distribution component data of the plurality of enrolled feature data corresponding to the enrolled user; and recognizing the input data based on the matching score.

Speaker verification
11594230 · 2023-02-28 · ·

Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.

Speaker verification
11594230 · 2023-02-28 · ·

Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate language independent-speaker verification. In one aspect, a method includes actions of receiving, by a user device, audio data representing an utterance of a user. Other actions may include providing, to a neural network stored on the user device, input data derived from the audio data and a language identifier. The neural network may be trained using speech data representing speech in different languages or dialects. The method may include additional actions of generating, based on output of the neural network, a speaker representation and determining, based on the speaker representation and a second representation, that the utterance is an utterance of the user. The method may provide the user with access to the user device based on determining that the utterance is an utterance of the user.