G10L13/07

ENHANCING HYBRID SELF-ATTENTION STRUCTURE WITH RELATIVE-POSITION-AWARE BIAS FOR SPEECH SYNTHESIS
20200258496 · 2020-08-13 · ·

A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

Digital audio methed for creating and sharingaudiobooks using a combination of virtual voices and recorded voices, customization based on characters, serilized content, voice emotions, and audio assembler module
20200258495 · 2020-08-13 ·

Colabnarration is six-step process that allows authors to create their own audiobooks with or without human recorded narration. The processes described herein are; 1) Serialization of the text-based novel or book. This process creates a record for each paragraph of text in the book (text file) and also creates a proprietary file to be used within the software application. 2) Creation of a character file. This process allows the author to create a list of characters and add all pertinent information required in the recording process and/or the virtualization process. 3) Combining the serialized file with the character file creates the Snippet file, which is used in the Snippet Manager. In this process, the author can assign characters to every snippet (text block) which will be used in the following step. 4) Generate audio files using 3rd party text-to-speech APIs. Each snippet (text block) is sent to a virtual voice API and is converted into an audio file. 5) Once all the snippets have been converted to audio files this module concatenates all the files and creates the full audiobook. 6) Share the project with a narrator. This process allows an author to assign characters to a specific narrator who will record just those assigned characters. Once an author has shared the project with a narrator, the project is sent to the narrator via an automated email message.

Digital audio methed for creating and sharingaudiobooks using a combination of virtual voices and recorded voices, customization based on characters, serilized content, voice emotions, and audio assembler module
20200258495 · 2020-08-13 ·

Colabnarration is six-step process that allows authors to create their own audiobooks with or without human recorded narration. The processes described herein are; 1) Serialization of the text-based novel or book. This process creates a record for each paragraph of text in the book (text file) and also creates a proprietary file to be used within the software application. 2) Creation of a character file. This process allows the author to create a list of characters and add all pertinent information required in the recording process and/or the virtualization process. 3) Combining the serialized file with the character file creates the Snippet file, which is used in the Snippet Manager. In this process, the author can assign characters to every snippet (text block) which will be used in the following step. 4) Generate audio files using 3rd party text-to-speech APIs. Each snippet (text block) is sent to a virtual voice API and is converted into an audio file. 5) Once all the snippets have been converted to audio files this module concatenates all the files and creates the full audiobook. 6) Share the project with a narrator. This process allows an author to assign characters to a specific narrator who will record just those assigned characters. Once an author has shared the project with a narrator, the project is sent to the narrator via an automated email message.

SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS
20200211529 · 2020-07-02 · ·

Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.

SYSTEMS AND METHODS FOR MULTI-STYLE SPEECH SYNTHESIS
20200211529 · 2020-07-02 · ·

Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.

System and method for distributed voice models across cloud and device for embedded text-to-speech

Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify speech units that are required for synthesizing speech. The system can request from a server the text-to-speech unit needed to synthesize the speech. The system can then synthesize speech using text-to-speech units already stored and a received text-to-speech unit from the server.

System and method for distributed voice models across cloud and device for embedded text-to-speech

Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify speech units that are required for synthesizing speech. The system can request from a server the text-to-speech unit needed to synthesize the speech. The system can then synthesize speech using text-to-speech units already stored and a received text-to-speech unit from the server.

System and method for unit selection text-to-speech using a modified Viterbi approach
10636412 · 2020-04-28 · ·

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

System and method for unit selection text-to-speech using a modified Viterbi approach
10636412 · 2020-04-28 · ·

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

Accessibility techniques for presentation of symbolic expressions

Methods for presenting symbolic expressions such as mathematical, scientific, or chemical expressions, formulas, or equations are performed by a computing device. One method includes: displaying a first portion of a symbolic expression within a first area of a display screen; while in a first state in which the first area is selected for aural presentation, aurally presenting first information related to the first portion of the symbolic expression; while in the first state, detecting particular user input; in response to detecting the particular user input, performing the steps of: transitioning from the first state to a second state in which a second area, of the display, is selected for aural presentation; determining second information associated with a second portion, of the symbolic expression, that is displayed within the second area; in response to determining the second information, aurally presenting the second information.