Patent classifications
G10L17/00
Voice synthesis for virtual agents
Techniques are described for generating a custom voice for a virtual agent. In one implementations, a method includes receiving information identifying a customer contacting a call center. The method includes selecting a voice for a virtual agent based on information about the customer. The method also includes assigning the voice to the virtual agent during communications with the customer.
Voice synthesis for virtual agents
Techniques are described for generating a custom voice for a virtual agent. In one implementations, a method includes receiving information identifying a customer contacting a call center. The method includes selecting a voice for a virtual agent based on information about the customer. The method also includes assigning the voice to the virtual agent during communications with the customer.
Systems and methods for an intelligent virtual assistant for meetings
- Daniel D McQuiston ,
- Aarti Narayanan ,
- Dave Burrells ,
- Simon Burke ,
- Jan S Dabrowski ,
- Rhys Dawes ,
- Charlotte Knight ,
- Libby Kent ,
- Sandeep Koul ,
- Uday Pant ,
- Tony M Nazarowski ,
- Aditi Vaidya ,
- Ayush Kumar Bilala ,
- Charanjith Allaparambil Chandran ,
- Prayag Godha ,
- Nikhil Kotikanikadanam Madhusudhan ,
- Chitra Pillai Sundaribai ,
- Aditya Anil Upadhyay ,
- Eric Han Kai Chang ,
- Stefan Cristian Bardasu ,
- Erin Michelle Perry ,
- Saifuddin Merchant ,
- James P White, III
Systems and methods for an intelligent virtual assistant for meetings are disclosed. In one embodiment, a system for an intelligent virtual assistant for meeting may include a server comprising at least one computer processor executing a virtual assistant computer program; a communication server in communication with the server; and a plurality of communication devices in communication with the server and the communication server, wherein the communication server facilitates an electronic meeting with a plurality of attendees via the plurality of communication devices. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees, may receive an edited transcription, and may update the speech recognition algorithm based on the edited transcription.
Electronic apparatus and control method thereof
An electronic apparatus includes a communication interface, and a processor configured to encrypt data in each of a plurality of sections of voice data, the voice data corresponding to a first user voice signal, and control the communication interface to transmit the encrypted data to a server. The processor is further configured to obtain an authentication key based on data in a first section of the plurality of sections, encrypted data in the first section having been transmitted to the server, and encrypt data in a second section to be transmitted by using the authentication key.
Multi-user devices in a connected home environment
A device implementing a system for responding to a voice request includes a processor configured to receive a voice request, the device being associated with a user account, and determine, based on the voice request, a confidence score that the voice request corresponds to a voice profile associated with the user account. The processor is further configured to select, based at least in part on a content of the voice request and the confidence score, a request domain from among plural request domains for responding to the voice request, and provide for a response to the voice request based on the selected request domain.
Spatially informed audio signal processing for user speech
A device implementing a system for processing speech in an audio signal includes at least one processor configured to receive an audio signal corresponding to at least one microphone of a device, and to determine, using a first model, a first probability that a speech source is present in the audio signal. The at least one processor is further configured to determine, using a second model, a second probability that an estimated location of a source of the audio signal corresponds to an expected position of a user of the device, and to determine a likelihood that the audio signal corresponds to the user of the device based on the first and second probabilities.
Techniques to provide sensitive information over a voice connection
Embodiments may generally be directed components and techniques to detect a request to provide banking account information over a one or more voice connections, identify the requested banking account information, and generate speech data representing the banking account information requested. In embodiments further include communicating the speech data to another device.
Techniques to provide sensitive information over a voice connection
Embodiments may generally be directed components and techniques to detect a request to provide banking account information over a one or more voice connections, identify the requested banking account information, and generate speech data representing the banking account information requested. In embodiments further include communicating the speech data to another device.
Model-based dubbing to translate spoken audio in a video
Model-based dubbing techniques are implemented to generate a translated version of a source video. Spoken audio portions of a source video may be extracted and semantic graphs generated that represent the spoken audio portions. The semantic graphs may be used to produce translations of the spoken portions. A machine learning model may be implemented to generate replacement audio for the spoken portions using the translation of the spoken portion. A machine learning model may be implemented to generate modifications to facial image data for a speaker of the replacement audio.
Model-based dubbing to translate spoken audio in a video
Model-based dubbing techniques are implemented to generate a translated version of a source video. Spoken audio portions of a source video may be extracted and semantic graphs generated that represent the spoken audio portions. The semantic graphs may be used to produce translations of the spoken portions. A machine learning model may be implemented to generate replacement audio for the spoken portions using the translation of the spoken portion. A machine learning model may be implemented to generate modifications to facial image data for a speaker of the replacement audio.