Patent classifications
G10L13/086
Methods and systems for facilitating conversion of content in public centers
A public center is disclosed. The public center includes at least one content source configured to provide a plurality of input contents and at least one content converter device for generating a plurality of output contents from the plurality of input contents based on user requests from a plurality of users. The content converter device includes an input module configured to receive a user request from a user of the plurality of users and an input content of the plurality of input contents based on the user request. The content converter device further includes a processing module configured to generate an output content from the input content based on input content characteristics. The public center includes a plurality of content access devices configured to provide the plurality of output contents received from the at least one content converter device to the plurality of users.
SPEECH SYNTHESIS METHOD, DEVICE AND COMPUTER READABLE STORAGE MEDIUM
The present disclosure relates to a speech synthesis method and device, and a computer-readable storage medium, and relates to the field of computer technology. The method of the present disclosure includes: dividing a text into a plurality of segments according to a language category to which each of the segments belongs; converting each of the segments into a phoneme corresponding to the segment to generate a phoneme sequence of the text according to the language category to which each of the segments belongs; inputting the phoneme sequence into a speech synthesis model trained in advance and converting the phoneme sequence into a vocoder characteristic parameter; and inputting the vocoder characteristic parameter into a vocoder to generate a speech.
ARTIFICIAL VENTRILOQUIST-LIKE CONTACT CENTER AGENTS
The need for efficient and effective communications is of key importance to contact centers. Agent communications with customers are designed to maximize results while minimizing resources, in particular the time required for human agents to be engaged with a particular customer. Often the impact of two agents on a communication can both improve customer satisfaction and better produce the intended result of the communication. However, two (or more) live agents is resource intensive. By providing a virtual agent controlled, entirely or in part, by a live agent, the customer may be presented with the appearance of two agents while requiring the human resources of a single agent.
Configurable natural language output
A system is provided for determining a natural language output, responsive to a user input, using different speech personality profiles. The system may determine to user a particular language generation profile based at least in part on data relating to the user input and data corresponding to the response to the user input. The language generation profile may include different attributes that are used to determine the natural language output, such as, prosody, replacement words, injected words, sentence structure, etc.
Building a text-to-speech system from a small amount of speech data
A method of building a text-to-speech (TTS) system from a small amount of speech data includes receiving a first plurality of recorded speech samples from an assortment of speakers and a second plurality of recorded speech samples from a target speaker where the assortment of speakers does not include the target speaker. The method further includes training a TTS model using the first plurality of recorded speech samples from the assortment of speakers. Here, the trained TTS model is configured to output synthetic speech as an audible representation of a text input. The method also includes re-training the trained TTS model using the second plurality of recorded speech samples from the target speaker combined with the first plurality of recorded speech samples from the assortment of speakers. Here, the re-trained TTS model is configured to output synthetic speech resembling speaking characteristics of the target speaker.
READ IT!!
This present invention is to do exactly what it's called. It reads mail, letters and any paper correspondence for those individuals I described in the beginning of the specification. This invention is lightweight and portable and has headphone accessibility for privacy, which makes it easy to read your mail and letters anywhere. You just simply place the letters into input tray push play button and the machine reads it. As it is being read it slowly comes out of the bottom on the output tray.
System and method for multilingual conversion of text data to speech data
The present invention provides a system and method for converting text data into speech data. Initially, the system enables a user to select a language from a plurality of languages supported by the operating system (OS) of a computing device. Further, on selecting and copying any text data, the system provides the user with options to listen to an audio output of the text data. The user is provided with options to listen to text data in either English or the selected language, when the language of the text data is one among the plurality of languages supported by the OS. Further, the user is provided with options to listen to text data in English, for the text data in any language. Once the user selects the option, the system converts the text data to speech data. The speech data is provided as the audio output to the user.
Translational bot for group communication
The present disclosure is directed to systems, methods and devices for providing real-time translation for group communications. A speech input may be received from a first group communication device associated with a first language. One or more groups to distribute the speech input may be determined, wherein each of the one or more groups comprises at least one group communication device associated with a language that is different than the first language. The received speech input may be translated into a corresponding language for each of the one or more groups, and the translated speech may be sent to each group communication device of the one or more groups in a language corresponding to each of the one or more groups.
Using Speech Recognition to Improve Cross-Language Speech Synthesis
A method for training a speech recognition model includes obtaining a multilingual text-to-speech (TTS) model. The method also includes generating a native synthesized speech representation for an input text sequence in a first language that is conditioned on speaker characteristics of a native speaker of the first language. The method also includes generating a cross-lingual synthesized speech representation for the input text sequence in the first language that is conditioned on speaker characteristics of a native speaker of a different second language. The method also includes generating a first speech recognition result for the native synthesized speech representation and a second speech recognition result for the cross-lingual synthesized speech representation. The method also includes determining a consistent loss term based on the first speech recognition result and the second speech recognition result and updating parameters of the speech recognition model based on the consistent loss term.
Hotword-aware speech synthesis
A method includes receiving text input data for conversion into synthesized speech and determining, using a hotword-aware model trained to detect a presence of a hotword assigned to a user device, whether a pronunciation of the text input data includes the hotword. The hotword is configured to initiate a wake-up process on the user device for processing the hotword and/or one or more other terms following the hotword in the audio input data. When the pronunciation of the text input data includes the hotword, the method also includes generating an audio output signal from the text input data and providing the audio output signal to an audio output device to output the audio output signal. The audio output signal when captured by an audio capture device of the user device, configured to prevent initiation of the wake-up process on the user device.