G10L13/04

SYNTHESIZED SPEECH AUDIO DATA GENERATED ON BEHALF OF HUMAN PARTICIPANT IN CONVERSATION
20230046658 · 2023-02-16 ·

Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, using a speech recognition model and/or can include a selection of a rendered suggestion that conveys the textual segment(s). Some implementations dynamically determine one or more prosodic properties for use in speech synthesis of the textual segment, and generate the synthesized speech with the one or more determined prosodic properties. The prosodic properties can be determined based on the textual segment(s) used in speech synthesis, textual segment(s) corresponding to recent spoken input of additional participant(s), attribute(s) of relationship(s) between the given user and additional participant(s) in the conversation, and/or feature(s) of a current location for the conversation.

SYNTHESIZED SPEECH AUDIO DATA GENERATED ON BEHALF OF HUMAN PARTICIPANT IN CONVERSATION
20230046658 · 2023-02-16 ·

Generating synthesized speech audio data on behalf of a given user in a conversation. The synthesized speech audio data includes synthesized speech that incorporates textual segment(s). The textual segment(s) can include recognized text that results from processing spoken input, of the given user, using a speech recognition model and/or can include a selection of a rendered suggestion that conveys the textual segment(s). Some implementations dynamically determine one or more prosodic properties for use in speech synthesis of the textual segment, and generate the synthesized speech with the one or more determined prosodic properties. The prosodic properties can be determined based on the textual segment(s) used in speech synthesis, textual segment(s) corresponding to recent spoken input of additional participant(s), attribute(s) of relationship(s) between the given user and additional participant(s) in the conversation, and/or feature(s) of a current location for the conversation.

Text to Speech Processing Method, Terminal, and Server
20230045631 · 2023-02-09 ·

A text to speech processing method implemented by a terminal includes detecting an instruction to perform a text to speech conversion, sending text to a server downloading from the server, audio data based on the text, determining whether a first frame of playable audio data is downloaded within a preset duration, and continuing to download remaining audio data when the first frame is downloaded within the preset duration.

Training Speech Synthesis to Generate Distinct Speech Sounds

A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

Automatic Voiceover Generation

A method includes receiving a voice request to generate synthesized voiceover speech for a target advertisement having one or more advertising campaign attributes. The method also includes generating, based on the one or more advertising campaign attributes, a voiceover script that includes a sequence of text for the synthesized voiceover speech. The method also includes generating, using a text-to-speech (TTS) system, the synthesized voiceover speech. The TTS system is configured to receive, as input, the sequence of text for the voiceover script and generate, as output, the synthesized voiceover speech. Here, the synthesized voiceover speech has speech characteristics specified by a target TTS vertical. The method also includes overlaying the synthesized voiceover speech on the target advertisement.

Automatic Voiceover Generation

A method includes receiving a voice request to generate synthesized voiceover speech for a target advertisement having one or more advertising campaign attributes. The method also includes generating, based on the one or more advertising campaign attributes, a voiceover script that includes a sequence of text for the synthesized voiceover speech. The method also includes generating, using a text-to-speech (TTS) system, the synthesized voiceover speech. The TTS system is configured to receive, as input, the sequence of text for the voiceover script and generate, as output, the synthesized voiceover speech. Here, the synthesized voiceover speech has speech characteristics specified by a target TTS vertical. The method also includes overlaying the synthesized voiceover speech on the target advertisement.

Digital audio method for creating and sharing audio books using a combination of virtual voices and recorded voices, customization based on characters, serialized content, voice emotions, and audio assembler module
11594210 · 2023-02-28 ·

A method includes receiving a text file of an author's book as input to a serialized process that creates a record of each paragraph of text and creating a character file with associated character attributes and information required for the recording process and or virtualization process. The method includes combining the serialized file with the character file to create a snippet file, assigning characters to snippets, and generating audio files from snippets using text-to-speech APIs. The snippets of text are assigned to a character, can be edited and audio played back. The method includes sharing snippets with narrators to record specific characters not represented by text-to-speech synthesized audio and concatenating all audio files from snippets, with proper time spacing, into a publishable audiobook format. The snippets are concatenated, and audio files are created through links to text-to-speech API processes. The snippets are concatenated and shared with a human narrator.

METHOD, ELECTRONIC DEVICE, AND RECORDING MEDIUM FOR NOTIFYING OF SURROUNDING SITUATION INFORMATION
20180012073 · 2018-01-11 ·

According to various embodiments, a method for notifying of surrounding situation information by an electronic device may comprise the operations of: monitoring a value indicating a movement of the electronic device; determining whether a state of the electronic device is a stopped state, on the basis of the value indicating a movement of the electronic device; and acquiring surrounding situation information of the electronic device, which will be notified of to a user, when the state of the electronic device is a stopped state; and outputting the surrounding situation information.

MOBILE ELECTRONIC DEVICE AND OPERATION METHOD THEREFOR
20180013882 · 2018-01-11 ·

An operation method for a mobile electronic device is provided. The operation method includes: transmitting a calling phone number to a wireless audio product from an operation system of the mobile electronic device via wireless communication, wherein the mobile electronic device is wirelessly connected to the wireless audio product; transmitting the calling phone number to an application software of the mobile electronic device by the wireless audio product; searching a caller name corresponding to the calling phone number by the application software of the mobile electronic device; transmitting the caller name to the wireless audio product by the application software of the mobile electronic device via wireless communication; and playing the caller name by the wireless audio product.

Methods and apparatus for obtaining biometric data
11710475 · 2023-07-25 · ·

A method of modelling speech of a user of a headset comprising a microphone, the method comprising: receiving a first sample, from a bone-conduction sensor, representing bone-conducted speech of the user; obtaining a measure of fundamental frequency of the bone-conducted speech in each of a plurality of speech frames of the first sample; obtaining a first distribution of the fundamental frequencies of the bone-conducted speech over the plurality of speech frames; receiving, from the microphone, a second sample; determining a first acoustic condition at the headset based on the second signal; performing a biometric process based on the first distribution of fundamental frequencies and the first acoustic condition.