G10L2013/083

Synthetic speech processing

A speech-processing system receives input data representing text. An encoder processes segments of the text to determine embedding data, and a decoder processes the embedding data to determine one or more categories associated with each segment. Output data is determined by selecting words based on the segments and categories.

PREFERRED EMOJI IDENTIFICATION AND GENERATION
20180074661 · 2018-03-15 ·

A system and method of identifying and generating preferred emojis includes: detecting at a wireless device a plurality of selected emoji; determining the frequency with which each emoji is selected; identifying a defined number of emojis from the plurality of selected emojis based on the frequency with which each emoji is selected; and creating a frequently-used emoji library for the identified emojis.

TRANSMISSION DEVICE, TRANSMISSION METHOD, RECEPTION DEVICE, AND RECEPTION METHOD
20180062777 · 2018-03-01 · ·

There is provided a transmission device, including circuitry configured to receive alert information including metadata related to a predetermined pronunciation of a message. The circuitry is configured to generate vocal information for the message based on the metadata included in the alert information. The circuitry is further configured to transmit emergency information that includes the message and the generated vocal information for the message.

IMMERSIVE ELECTRONIC READING

Electronic reading devices provide readers with text on a display, and enhancements to their functionality and efficiency are discussed herein. Text is provided to the reader in an enhanced contrast mode that highlights the active word and line of the text as well as words of interest in the text so as to improve the functionality of the electronic reading device itself as a provider of textual content.

System and method for synthetically generated speech describing media content

Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.

Method and system for generating synthetic speech for text through user interface
12183320 · 2024-12-31 · ·

A method for generating synthetic speech for text through a user interface is provided. The method may include receiving one or more sentences, determining a speech style characteristic for the received one or more sentences, and outputting a synthetic speech for the one or more sentences that reflects the determined speech style characteristic. The one or more sentences and the determined speech style characteristic may be inputted to an artificial neural network text-to-speech synthesis model and the synthetic speech may be generated based on the speech data outputted from the artificial neural network text-to-speech synthesis model.

Method and device for editing singing voice synthesis data, and method for analyzing singing
09818396 · 2017-11-14 · ·

A singing voice synthesis data editing method includes adding, to singing voice synthesis data, a piece of virtual note data placed immediately before a piece of note data having no contiguous preceding piece of note data, the singing voice synthesis data including: multiple pieces of note data for specifying a duration and a pitch at which each note that is in a time series, representative of a melody to be sung, is voiced; multiple pieces of lyric data associated with at least one of the multiple pieces of note data; and a sequence of sound control data that directs sound control over a singing voice synthesized from the multiple pieces of lyric data, and obtaining the sound control data that directs sound control over the singing voice synthesized from the multiple pieces of lyric data, and that is associated with the piece of virtual note data.

SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH

Systems and methods are disclosed for providing non-lexical cues in synthesized speech. Original text is analyzed to determine characteristics of the text and/or to derive or augment an intent (e.g., an intent code). Non-lexical cue insertion points are determined based on the characteristics of the text and/or the intent. One or more non-lexical cues are inserted at insertion points to generate augmented text. The augmented text is synthesized into speech, including converting the non-lexical cues to speech output.

TEXT TO SPEECH PAGING SYSTEM
20170193984 · 2017-07-06 ·

A text-to-speech paging system can include a paging receiver, a text-to-speech module, an audio processing module, a radio transmitter, and a controller. The paging receiver can be configured to receive a radio-frequency text pager message from an industry-standard paging transmitter. The paging receiver can include a decoder configured to decode the radio-frequency text pager message to a decoded pager message. The text-to-speech module can be configured to convert the decoded pager message into audio information, and the audio processing module can be configured to process the audio information. The radio transmitter can be configured to receive the audio information from the audio processing module and transmit the audio information in an industry-standard two-way radio protocol. The controller can be configured to control the text to speech paging system.

QUESTION AND ANSWER PROCESSING METHOD AND ELECTRONIC DEVICE FOR SUPPORTING THE SAME
20170154626 · 2017-06-01 ·

An electronic device is provided. The electronic device includes a communication interface comprising communication circuitry and a processor configured to functionally connect with the communication interface, wherein the processor is configured to: obtain a voice signal, obtain context information associated with the user in connection with obtaining the voice signal, determine first response information corresponding to the voice signal, if the context information meets a first condition, determine second response information corresponding to the voice signal, if the context information meets a second condition and send at least part of response information corresponding to the first response information or the second response information to an output device operatively connected with the electronic device or an external electronic device for the electronic device.