G10L2013/083

Methods and systems for interactive online language learning in a pandemic-aware world
11380334 · 2022-07-05 ·

Methods and Systems for Interactive Online Language Learning in a Pandemic-aware World. An acoustic-representation-enabled method for teaching an acoustic representation corresponding to a sequence of cantillations of a Hebrew Bible verse, the method comprising: (a) computationally displaying, using a computing device, a Hebrew Bible verse, (b) computationally analyzing a Hebrew Bible verse to identify therein a sequence of cantillation symbols, (c) computationally analyzing the sequence of cantillation symbols to computationally generate a chanting acoustic representation of an ordered sequence of Hebrew Bible trope names corresponding to the sequence of cantillation symbols, (d) computationally by an acoustic-representation-enabled process, changing the pitch level of the chanting acoustic representation of the ordered sequence of Hebrew Bible trope names corresponding to the sequence of cantillation symbols, and (e) playing, using a computing device, the chanting acoustic representation of the ordered sequence of Hebrew Bible trope names corresponding to the sequence of cantillation symbols.

GENERATING AND PROVIDING INFORMATION OF A SERVICE

A method for generating and providing information of a service includes: generating output text from the information; transferring the output text to a text analysis service which performs: an analysis of complexity of the output text; an analysis of punctuation marks and a determination of text passages of the output text relating to accentuation and pauses; an analysis of formatting of the output text; an analysis of word importance in the output text; and/or a classification of a recipient; outputting the result of the text analysis service in the form of output text analysis metadata; transferring the output text, the output text analysis metadata, and user metadata to a categorization service which selects at least one output medium for presenting the output text to a user; and presenting the output text to the user.

Readout of Communication Content Comprising Non-Latin or Non-Parsable Content Items for Assistant Systems

In one embodiment, a method includes accessing a communication content including zero or more Latin script text strings and one or more non-Latin script content items, determining a readout of the communication content based on parsing rules, wherein the parsing rules specify formats for the readout based on attributes of the non-Latin script content items, and wherein the readout includes the zero or more Latin script text strings and a description of the non-Latin script content items, and sending instructions for presenting an audio rendering of the readout of the communication content to a client system.

GENERATING AUDIO FOR A PLAIN TEXT DOCUMENT
20210158795 · 2021-05-27 · ·

The present disclosure provides method and apparatus for generating audio for a plain text document. At least a first utterance may be detected from the document. Context information of the first utterance may be determined from the document. A first role corresponding to the first utterance may be determined from the context information of the first utterance. Attributes of the first role may be determined. A voice model corresponding to the first role may be selected based at least on the attributes of the first role. Voice corresponding to the first utterance may be generated through the voice model.

Systems and methods for providing non-lexical cues in synthesized speech

Systems and methods are disclosed for providing non-lexical cues in synthesized speech. An example system includes processor circuitry to generate a breathing cue to enhance speech to be synthesized from text; determine a first insertion point of the breathing cue in the text, wherein the breathing cue is identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text; determine a second insertion point of the prosody cue in the text, wherein the prosody cue is identified by a second tag of the markup language; insert the breathing cue at the first insertion point based on the first tag and the prosody cue at the second insertion point based on the second tag; and trigger a synthesis of the speech from the text, the breathing cue, and the prosody cue.

METHOD AND SYSTEM FOR GENERATING SYNTHETIC SPEECH FOR TEXT THROUGH USER INTERFACE
20210142783 · 2021-05-13 · ·

A method for generating synthetic speech for text through a user interface is provided. The method may include receiving one or more sentences, determining a speech style characteristic for the received one or more sentences, and outputting a synthetic speech for the one or more sentences that reflects the determined speech style characteristic. The one or more sentences and the determined speech style characteristic may be inputted to an artificial neural network text-to-speech synthesis model and the synthetic speech may be generated based on the speech data outputted from the artificial neural network text-to-speech synthesis model.

Extracting content from audio files using text files
10957304 · 2021-03-23 · ·

Devices and methods are provided for extracting content from audio files. The device may determine starting and ending quotation marks in a text file, and a string between the starting and ending quotation marks. The device may determine that a verb is near the starting quotation mark or the ending quotation mark. The device may determine, based on the verb, that the string is attributed to a character name in the text file. The device may determine a first time in a first audio file including an audio representation of the text file, and may determine a second time in the first audio file, wherein the first time is before the first word and the second time is after the second word. The device may generate a second audio file by extracting audio from the first audio file based on the first and second times.

Silence calculator
11062693 · 2021-07-13 · ·

To provide a more natural sounding set of voice prompts of an interactive voice response (IVR) script, the voice recordings of the prompts may be modified to have a predetermined amount of silence at the end of the recording. The amount of silence required can be determined from the context in which the voice prompt appears in the IVR script. Different contexts may include mid-sentence, terminating in a comma, or a sentence ending context. These contexts may require silence periods of 100 ms, 250 ms and 500 ms respectively. Voice files may be trimmed to remove any existing silence and then the required silence period may be added.

APPARATUS FOR MEDIA ENTITY PRONUNCIATION USING DEEP LEARNING
20200357390 · 2020-11-12 ·

Methods, systems, and related products for voice-enabled computer systems are described. A machine-learning model is trained to produce pronunciation output based on text input. The trained machine-learning model is used to produce pronunciation data for text input even where the text input includes numbers, punctuation, emoji, or other non-letter characters. The machine-learning model is further trained based on real-world data from users to improve pronunciation output.

Air writing to speech system using gesture and wrist angle orientation for synthesized speech modulation

A gesture to speech conversion device may receive indications of user gestures via at least one sensor, the indications identifying movement in three dimensions. A 2-dimensional (2D) plane on which a beginning of the movement and an end of the movement is substantially planar and a third dimension orthogonal to the 2D plane may be determined. A change of the movement in a direction of the third dimension in a course of the movement occurring on the 2D plane is detected. The change of the movement in the third dimension is mapped to an emphasis in the movement. The movement is transformed into speech with emphasis on a part of the speech corresponding to a part of the movement having the detected change.