Patent classifications
G10L17/04
Intelligent voice recognizing method, apparatus, and intelligent computing device
An intelligent voice recognition method, voice recognition apparatus and intelligent computing device are disclosed. An intelligent voice recognition method of a voice recognition apparatus according to an embodiment of the present invention detects a voice of a user, receives an authentication request from the user, and performs authentication for the user on the basis of a result of determination of whether authentication for the user has recently been performed and a result of recognition of the voice of the user, thereby reducing a time and the quantity of calculations necessary for user authentication. One or more of the voice recognition apparatus and the intelligent computing device can be associated with artificial intelligence (AI) modules, unmanned aerial vehicle (UAV) robots, augmented reality (AR) devices, virtual reality (VR) devices, 5G service related devices, etc.
Speech recognition
A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
Speech recognition
A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
Training method of a speaker identification model based on a first language and a second language
A training method of training a speaker identification model which receives voice data as an input and outputs speaker identification information for identifying a speaker of an utterance included in the voice data is provided. The training method includes: performing voice quality conversion of first voice data of a first speaker to generate second voice data of a second speaker; and performing training of the speaker identification model using, as training data, the first voice data and the second voice data.
Training method of a speaker identification model based on a first language and a second language
A training method of training a speaker identification model which receives voice data as an input and outputs speaker identification information for identifying a speaker of an utterance included in the voice data is provided. The training method includes: performing voice quality conversion of first voice data of a first speaker to generate second voice data of a second speaker; and performing training of the speaker identification model using, as training data, the first voice data and the second voice data.
User-specific acoustic models
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
User-specific acoustic models
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results based on the plurality of speech inputs; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
AUDIO MATCHING METHOD AND RELATED DEVICE
Embodiments of the present application disclose an audio matching method and a related device. The audio matching method includes: obtaining audio data and video data; extracting to-be-recognized audio information from the audio data; extracting lip movement information of N users from the video data, where N is an integer greater than 1; inputting the to-be-recognized audio information and the lip movement information of the N users into a target feature matching model, to obtain a matching degree between each of the lip movement information of the N users and the to-be-recognized audio information; and determining a user corresponding to the lip movement information of the user with the highest matching degree as the target user to which the to-be-recognized audio information belongs.
AUTOMATIC INTERPRETATION SERVER AND METHOD BASED ON ZERO UI
Provided a method performed by an automatic interpretation server based on a zero user interface (UI), which communicates with a plurality of terminal devices having a microphone function, a speaker function, a communication function, and a wearable function. The method includes connecting terminal devices disposed within a designated automatic interpretation zone, receiving a voice signal of a first user from a first terminal device among the terminal devices within the automatic interpretation zone, matching a plurality of users placed within a speech-receivable distance of the first terminal device, and performing automatic interpretation on the voice signal and transmitting results of the automatic interpretation to a second terminal device of at least one second user corresponding to a result of the matching.
AUTOMATIC INTERPRETATION SERVER AND METHOD BASED ON ZERO UI
Provided a method performed by an automatic interpretation server based on a zero user interface (UI), which communicates with a plurality of terminal devices having a microphone function, a speaker function, a communication function, and a wearable function. The method includes connecting terminal devices disposed within a designated automatic interpretation zone, receiving a voice signal of a first user from a first terminal device among the terminal devices within the automatic interpretation zone, matching a plurality of users placed within a speech-receivable distance of the first terminal device, and performing automatic interpretation on the voice signal and transmitting results of the automatic interpretation to a second terminal device of at least one second user corresponding to a result of the matching.