G10L13/033

CONTROLLABLE, NATURAL PARALINGUISTICS FOR TEXT TO SPEECH SYNTHESIS
20220406292 · 2022-12-22 ·

A speech recognition module receives training data of speech and creates a representation for individual words, non-words, phonemes, and any combination. A set of speech processing detectors analyze the training data of speech from humans communicating. The set of speech processing detectors detect speech parameters that are indicative of paralinguistic effects on top of enunciated words, phonemes, and non-words in the audio stream. One or more machine learning models undergo supervised machine learning on their neural network to train on how to associate one or more mark-up markers with a textual representation, for each individual word, individual non-word, individual phoneme, and any combinations of these, that was enunciated with a particular paralinguistic effect. Each mark-up marker can correspond to its own paralinguistic effect.

ELECTRONIC DEVICE AND METHOD FOR CONTROLLING THEREOF
20220406293 · 2022-12-22 · ·

A method for controlling an electronic device includes obtaining a text, obtaining, by inputting the text into a first neural network model, acoustic feature information corresponding to the text and alignment information in which each frame of the acoustic feature information is matched with each phoneme included in the text, identifying an utterance speed of the acoustic feature information based on the alignment information, identifying a reference utterance speed for each phoneme included in the acoustic feature information based on the text and the acoustic feature information, obtaining utterance speed adjustment information based on the utterance speed of the acoustic feature information and the reference utterance speed for each phoneme, and obtaining, based on the utterance speed adjustment information, speech data corresponding to the text by inputting the acoustic feature information into a second neural network model.

SYSTEM FOR DECISIONING RESOURCE USAGE BASED ON REAL TIME FEEDBACK

Embodiments of the invention are directed to systems, methods, and computer program products for advising users on resource decisioning based on real-time user feedback. The invention utilized advanced machine learning technology in order emulate the voice patterns of familiar figures and generate text-to-speech audio files containing relevant recommendations to one or more users as determined by their user resource account history or indicated preferences. The invention may further account for the user's response in resource usage patterns after the recommendation is provided via continuous monitoring of the user's resource usage history, and may use this data to adapt over time to learn which voices or emulations the user prefers.

SYSTEM FOR DECISIONING RESOURCE USAGE BASED ON REAL TIME FEEDBACK

Embodiments of the invention are directed to systems, methods, and computer program products for advising users on resource decisioning based on real-time user feedback. The invention utilized advanced machine learning technology in order emulate the voice patterns of familiar figures and generate text-to-speech audio files containing relevant recommendations to one or more users as determined by their user resource account history or indicated preferences. The invention may further account for the user's response in resource usage patterns after the recommendation is provided via continuous monitoring of the user's resource usage history, and may use this data to adapt over time to learn which voices or emulations the user prefers.

System Providing Expressive and Emotive Text-to-Speech
20220392430 · 2022-12-08 ·

A speech to text system includes a text and labels module receiving a text input and providing a text analysis and a label with a phonetic description of the text. A label buffer receives the label from the text and labels module. A parameter generation module accesses the label from the label buffer and generates a speech generation parameter. A parameter buffer receives the parameter from the parameter generation module. An audio generation module receives the text input, the label, and/or the parameter and generates a plurality of audio samples, A scheduler monitors and schedules the text and label module, the parameter generation module, and/or the audio generation module. The parameter generation module is further configured to initialize a voice identifier with a Voice Style Sheet (VSS) parameter, receive an input indicating a modification to the VSS parameter, and modify the VSS parameter according to the modification.

USER SELF-PERSONALIZED TEXT-TO-SPEECH VOICE GENERATION
20220392428 · 2022-12-08 ·

An online system receives, from a client device of a posting user, a script for a voice-based content item. The online system retrieves a voice synthesis model stored in the user profile of the posting user and generates a synthetic audio stream using the retrieved voice synthesis model and based on the received script. The online system presents the generated synthetic audio stream to the posting user and receives instructions for modifying the synthetic audio stream. The online system generates a second audio stream based on the received instructions and composes the voice-based tent item based on the generated second audio stream.

MULTI-CONVERSATIONAL SOCIAL NETWORKING
20220394070 · 2022-12-08 · ·

This invention relates to a social networking application for emulating a multi-conversational event space.

Collaborative voice controlled devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for collaboration between multiple voice controlled devices are disclosed. In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving audio data that corresponds to an utterance; receiving a transcription of additional audio data outputted by the second computing device in response to the utterance; based on the transcription of the additional audio data and based on the utterance, generating a transcription that corresponds to a response to the additional audio data; and providing, for output, the transcription that corresponds to the response.

Collaborative voice controlled devices

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for collaboration between multiple voice controlled devices are disclosed. In one aspect, a method includes the actions of identifying, by a first computing device, a second computing device that is configured to respond to a particular, predefined hotword; receiving audio data that corresponds to an utterance; receiving a transcription of additional audio data outputted by the second computing device in response to the utterance; based on the transcription of the additional audio data and based on the utterance, generating a transcription that corresponds to a response to the additional audio data; and providing, for output, the transcription that corresponds to the response.

Method of combining audio signals

A method for automatically generating an audio signal, the method comprising receiving a source audio signal analyzing the source audio signal to identify a musical parameter characteristic thereof obtaining a supplemental audio signal based on the identified musical parameter characteristic and combining the source audio signal and the supplemental audio signal to form an extended audio signal.