G10L13/07

Method and apparatus for generating speech

A speech generation method and apparatus are disclosed. The speech generation method includes obtaining, by a processor, a linguistic feature and a prosodic feature from an input text, determining, by the processor, a first candidate speech element through a cost calculation and a Viterbi search based on the linguistic feature and the prosodic feature, generating, at a speech element generator implemented at the processor, a second candidate speech element based on the linguistic feature or the prosodic feature and the first candidate speech element, and outputting, by the processor, an output speech by concatenating the second candidate speech element and a speech sequence determined through the Viterbi search.

Synthetic speech processing

A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.

Digital audio method for creating and sharing audio books using a combination of virtual voices and recorded voices, customization based on characters, serialized content, voice emotions, and audio assembler module
11594210 · 2023-02-28 ·

A method includes receiving a text file of an author's book as input to a serialized process that creates a record of each paragraph of text and creating a character file with associated character attributes and information required for the recording process and or virtualization process. The method includes combining the serialized file with the character file to create a snippet file, assigning characters to snippets, and generating audio files from snippets using text-to-speech APIs. The snippets of text are assigned to a character, can be edited and audio played back. The method includes sharing snippets with narrators to record specific characters not represented by text-to-speech synthesized audio and concatenating all audio files from snippets, with proper time spacing, into a publishable audiobook format. The snippets are concatenated, and audio files are created through links to text-to-speech API processes. The snippets are concatenated and shared with a human narrator.

Digital audio method for creating and sharing audio books using a combination of virtual voices and recorded voices, customization based on characters, serialized content, voice emotions, and audio assembler module
11594210 · 2023-02-28 ·

A method includes receiving a text file of an author's book as input to a serialized process that creates a record of each paragraph of text and creating a character file with associated character attributes and information required for the recording process and or virtualization process. The method includes combining the serialized file with the character file to create a snippet file, assigning characters to snippets, and generating audio files from snippets using text-to-speech APIs. The snippets of text are assigned to a character, can be edited and audio played back. The method includes sharing snippets with narrators to record specific characters not represented by text-to-speech synthesized audio and concatenating all audio files from snippets, with proper time spacing, into a publishable audiobook format. The snippets are concatenated, and audio files are created through links to text-to-speech API processes. The snippets are concatenated and shared with a human narrator.

METHOD AND ELECTRONIC DEVICE FOR INTELLIGENTLY READING DISPLAYED CONTENTS

A method for intelligently reading displayed contents by an electronic device is provided. The method includes obtaining a screen representation based on a plurality of contents displayed on a screen of the electronic device. The method includes extracting a plurality of insights comprising at least one of intent, importance, emotion, sound representation and information sequence of the plurality of contents from the plurality of contents based on the screen representation. The method includes generating audio emulating the extracted plurality of insights.

METHOD AND ELECTRONIC DEVICE FOR INTELLIGENTLY READING DISPLAYED CONTENTS

A method for intelligently reading displayed contents by an electronic device is provided. The method includes obtaining a screen representation based on a plurality of contents displayed on a screen of the electronic device. The method includes extracting a plurality of insights comprising at least one of intent, importance, emotion, sound representation and information sequence of the plurality of contents from the plurality of contents based on the screen representation. The method includes generating audio emulating the extracted plurality of insights.

METHOD FOR GENERATING BROADCAST SPEECH, DEVICE AND COMPUTER STORAGE MEDIUM

Technical solution relates to the fields of voice technologies and knowledge graph technologies. A technical solution includes: acquiring script matched with a scenario from a speech package, and acquiring a broadcast template configured for the scenario in advance; and filling the broadcast template with the script to generate the broadcast speech.

METHOD FOR GENERATING BROADCAST SPEECH, DEVICE AND COMPUTER STORAGE MEDIUM

Technical solution relates to the fields of voice technologies and knowledge graph technologies. A technical solution includes: acquiring script matched with a scenario from a speech package, and acquiring a broadcast template configured for the scenario in advance; and filling the broadcast template with the script to generate the broadcast speech.

Voice synthesis method, voice synthesis apparatus, and recording medium
11495206 · 2022-11-08 · ·

Voice synthesis method and apparatus generate second control data using an intermediate trained model with first input data including first control data designating phonetic identifiers, change the second control data in accordance with a first user instruction provided by a user, generate synthesis data representing frequency characteristics of a voice to be synthesized using a final trained model with final input data including the first control data and the changed second control data, and generate a voice signal based on the generated synthesis data.

Artificial intelligence apparatus for generating text or speech having content-based style and method for the same
11488576 · 2022-11-01 · ·

Provided is an artificial intelligence (AI) apparatus for generating a speech having a content-based style, including: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text, extract at least one content keyword corresponding to the text, determine a speech style based on the extracted content keyword, generate a speech corresponding to the text by using a TTS engine corresponding to the determined speech style among the plurality of TTS engines, and output the generated speech.