Patent classifications
G10L2015/0635
System and method for registering device for voice assistant service
A system and method for registering a new device for a voice assistant service. The method, performed by a server, of registering a new device for a voice assistant service includes: comparing functions of a pre-registered device with functions of the new device; identifying functions corresponding to the functions of the pre-registered device among the functions of the new device, based on the comparison; obtaining pre-registered utterance data related to at least some of the identified functions; generating action data for the new device based on the identified functions and the pre-registered utterance data.
Electronic apparatus and method for controlling voice recognition thereof
Disclosed is an electronic apparatus capable of controlling voice recognition. The electronic apparatus increases a score of a category corresponding to a word included in user's utterance in a database when the instruction included in the user's utterance is present in the database. The electronic apparatus checks whether the score of the category corresponding to the word is equal to or greater than a preset value when the instruction is not present in the database. The electronic apparatus registers the instruction in the database so that the instruction is included in the category corresponding to the word when the score is equal to or greater than the preset value as the check result.
LARGE SCALE PRIVACY-PRESERVING SPEECH RECOGNITION SYSTEM USING FEDERATED LEARNING
A method for implementing a privacy-preserving automatic speech recognition system using federated learning. The method includes receiving, from respective client devices, at a cloud server, local acoustic model weights for a neural network-based acoustic model of a local automatic speech recognition system running on the respective client devices, wherein the local acoustic model weights are generated at the respective client devices without labelled data, updating a global automatic speech recognition system based on (a) the local acoustic model weights received from the respective client devices and (b) global acoustic model weights of the global automatic speech recognition system derived from labelled data to obtain an updated global automatic speech recognition system, and sending the updated global automatic speech recognition system to the respective client devices to operate as a new local automatic speech recognition system.
Interfacing with applications via dynamically updating natural language processing
Dynamic interfacing with applications is provided. For example, a system receives a first input audio signal. The system processes, via a natural language processing technique, the first input audio signal to identify an application. The system activates the application for execution on the client computing device. The application declares a function the application is configured to perform. The system modifies the natural language processing technique responsive to the function declared by the application. The system receives a second input audio signal. The system processes, via the modified natural language processing technique, the second input audio signal to detect one or more parameters. The system determines that the one or more parameters are compatible for input into an input field of the application. The system generates an action data structure for the application. The system inputs the action data structure into the application, which executes the action data structure.
Systems and methods for an intelligent virtual assistant for meetings
- Daniel D McQuiston ,
- Aarti Narayanan ,
- Dave Burrells ,
- Simon Burke ,
- Jan S Dabrowski ,
- Rhys Dawes ,
- Charlotte Knight ,
- Libby Kent ,
- Sandeep Koul ,
- Uday Pant ,
- Tony M Nazarowski ,
- Aditi Vaidya ,
- Ayush Kumar Bilala ,
- Charanjith Allaparambil Chandran ,
- Prayag Godha ,
- Nikhil Kotikanikadanam Madhusudhan ,
- Chitra Pillai Sundaribai ,
- Aditya Anil Upadhyay ,
- Eric Han Kai Chang ,
- Stefan Cristian Bardasu ,
- Erin Michelle Perry ,
- Saifuddin Merchant ,
- James P White, III
Systems and methods for an intelligent virtual assistant for meetings are disclosed. In one embodiment, a system for an intelligent virtual assistant for meeting may include a server comprising at least one computer processor executing a virtual assistant computer program; a communication server in communication with the server; and a plurality of communication devices in communication with the server and the communication server, wherein the communication server facilitates an electronic meeting with a plurality of attendees via the plurality of communication devices. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees, may receive an edited transcription, and may update the speech recognition algorithm based on the edited transcription.
SYSTEM AND METHOD FOR VOICE BIOMETRICS AUTHENTICATION
A system and method for authenticating an identity may include generating a first generic representation representing a stored audio content, generating a second generic representation representing input audio content, and, providing the first and second generic representations to a voice biometrics unit adapted to authenticate an identity based on the first and second generic representations.
Methods and systems for correcting transcribed audio files
Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data.
Whispering voice recovery method, apparatus and device, and readable storage medium
A method, an apparatus and a device for converting a whispered speech, and a readable storage medium are provided. The method is implemented based on the whispered speech converting model. The whispered speech converting model is trained in advance by using recognition results and whispered speech training acoustic features of whispered speech training data as samples and using normal speech acoustic features of normal speech data parallel to the whispered speech training data as sample labels. A whispered speech acoustic feature and a preliminary recognition result of whispered speech data are acquired, then the whispered speech acoustic feature and the preliminary recognition result are inputted into a preset whispered speech converting model to acquire a normal speech acoustic feature outputted by the model. In this way, the whispered speech can be converted to a normal speech.
Contact resolution for communications systems
Methods and systems for performing contact resolution are described herein. When initiating a communications session using a voice activated electronic device, a contact name may be resolved to determine an appropriate contact with which the communications session may be directed to. Contacts from an individual's contact list may be queried to determine a listing of probable contacts associated with the contact name, and contact identifiers associated with the contact may be determined. Using one or more rules for disambiguating between similar contact names, a single contact may be identified, and a communications session with that contact may be initiated.
Interfacing with applications via dynamically updating natural language processing
Dynamic interfacing with applications is provided. For example, a system receives a first input audio signal. The system processes, via a natural language processing technique, the first input audio signal to identify an application. The system activates the application for execution on the client computing device. The application declares a function the application is configured to perform. The system modifies the natural language processing technique responsive to the function declared by the application. The system receives a second input audio signal. The system processes, via the modified natural language processing technique, the second input audio signal to detect one or more parameters. The system determines that the one or more parameters are compatible for input into an input field of the application. The system generates an action data structure for the application. The system inputs the action data structure into the application, which executes the action data structure.