Patent classifications
G10L13/00
System and method using parameterized speech synthesis to train acoustic models
A method for removing private data from an acoustic model includes capturing speech from a large population of users, creating a text-to-speech voice from at least a portion of the large population of users, discarding speech data from a database of speech, creating text-to-speech waveforms from the text-to-speech voice and the new database of speech with the discarded speech data and generating an automatic speech recognition model using the text-to-speech waveforms.
System and method using parameterized speech synthesis to train acoustic models
A method for removing private data from an acoustic model includes capturing speech from a large population of users, creating a text-to-speech voice from at least a portion of the large population of users, discarding speech data from a database of speech, creating text-to-speech waveforms from the text-to-speech voice and the new database of speech with the discarded speech data and generating an automatic speech recognition model using the text-to-speech waveforms.
Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy
Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.
Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy
Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.
INFORMATION PROCESSING DEVICE, CONTROL METHOD OF THE SAME, AND PROGRAM
Communication with a user is more naturally and effectively achieved. An information processing device according to an embodiment includes: a voice module (340) that outputs sound or voice in accordance with an action plan that has been input; a motion module (350) that executes an action in accordance with the action plan that has been input; and a body controller (360) that creates an action plan for each of the voice module and the motion module, in which the body controller: acquires audio data for outputting audio and motion data for executing an action; creates a first action plan for the voice module and a second action plan for the motion module on the basis of the audio data and the motion data; and inputs the first action plan to the voice module and inputs the second action plan to the motion module.
INFORMATION PROCESSING DEVICE, CONTROL METHOD OF THE SAME, AND PROGRAM
Communication with a user is more naturally and effectively achieved. An information processing device according to an embodiment includes: a voice module (340) that outputs sound or voice in accordance with an action plan that has been input; a motion module (350) that executes an action in accordance with the action plan that has been input; and a body controller (360) that creates an action plan for each of the voice module and the motion module, in which the body controller: acquires audio data for outputting audio and motion data for executing an action; creates a first action plan for the voice module and a second action plan for the motion module on the basis of the audio data and the motion data; and inputs the first action plan to the voice module and inputs the second action plan to the motion module.
SMART-LEARNING AND KNOWLEDGE RETRIEVAL SYSTEM WITH INTEGRATED CHATBOTS
A computer-implemented method and a smart-learning and knowledge retrieval system (SLKRS) are provided for imparting adaptive and personalized e-learning based on continually artificially learned unique characteristics of a knowledge seeker. A chatbot platform with a chatbot interface provides for interaction between the knowledge seeker, a parent, a teacher, or another stakeholder. The chatbot platform allows multiple channels of engagement. The chatbot platform provides translation services comprising text to speech and speech to text service. The chatbot platform integrates third-party services into its responses to the user and queries from the user through the integration module. The chatbot platform performs pattern recognition and checks simplified and rephrased questions against a knowledge base. The chatbot platform uses conversation audits to train artificial intelligence and machine learning algorithms, to generate an appropriate response to the query of the knowledge seeker.
SYSTEMS, METHODS, AND DEVICES FOR AUDIO CORRECTION
Systems, methods, and devices relating to audio correction are described. A first portion of content including first spoken audio content indicating first word(s) may be determined. Background audio content of the first portion of the content may be determined. A voice profile may be determined based on the first spoken audio content. Based on the voice profile, second spoken audio content indicating second word(s) to replace the first word(s) may be generated. Based on mixing the background audio content and the second spoken audio content, a second portion of content may be determined. In the content, the first portion of the content may be replaced with the generated second portion of content.
Systems and methods for presenting social network communications in audible form based on user engagement with a user device
Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.
Systems and methods for presenting social network communications in audible form based on user engagement with a user device
Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.