IPIQ

G10L13/086

ELECTRONIC MUSICAL INSTRUMENT, METHOD, AND STORAGE MEDIUM

20220076658 · 2022-03-10 ·

Casio Computer Co., Ltd.

An electronic musical instrument includes: a plurality of keys that include at least first keys corresponding to a first pitch range and second keys corresponding to a second pitch range; and at least one processor, configured to perform the following: in accordance with a key operation in the first pitch range, determining a syllable position contained in a phrase; and in accordance with a key operation in the second pitch rang, instructing a sound production of a digitally synthesized sound corresponding to the determined syllable position.

Building a Text-to-Speech System from a Small Amount of Speech Data

20220068256 · 2022-03-03 ·

Google Llc

A method of building a text-to-speech (TTS) system from a small amount of speech data includes receiving a first plurality of recorded speech samples from an assortment of speakers and a second plurality of recorded speech samples from a target speaker where the assortment of speakers does not include the target speaker. The method further includes training a TTS model using the first plurality of recorded speech samples from the assortment of speakers. Here, the trained TTS model is configured to output synthetic speech as an audible representation of a text input. The method also includes re-training the trained TTS model using the second plurality of recorded speech samples from the target speaker combined with the first plurality of recorded speech samples from the assortment of speakers. Here, the re-trained TTS model is configured to output synthetic speech resembling speaking characteristics of the target speaker.

SYSTEMS AND METHODS FOR NAME PRONUNCIATION

20210327409 · 2021-10-21 ·

Devang K. NAIK

Systems and methods are provided for associating a phonetic pronunciation with a name by receiving the name, mapping the name to a plurality of monosyllabic components that are combinable to construct the phonetic pronunciation of the name, receiving a user input to select one or more of the plurality, and combining the selected one or more of the plurality of monosyllabic components to construct the phonetic pronunciation of the name.

AUTOMATIC GENERATION OF VIDEOS FOR DIGITAL PRODUCTS

20210319781 · 2021-10-14 ·

Videate, Inc.

David Christian Gullo

A system for generating videos uses a domain-specific instructional language and a video rendering engine that produces videos against a digital product which changes and evolves over time. The video rendering engine uses the instructions in an instruction markup document written in the domain-specific instructional language to generate a video while navigating a web-based document representing the digital product for which the video is generated. The video rendering engine navigates the web-based document, coupled with the instruction markup document, which explains the operations to be performed on the web-based document. The instruction markup document also identifies the special effects that manipulate the underlying product in real-time, includes the spoken text for generating subtitles, and provides formalized change management by design.

Computer generated emulation of a subject

11144597 · 2021-10-12 ·

Kabushiki Kaisha Toshiba

A system for emulating a subject, to allow a user to interact with a computer generated talking head with the subject's face and voice; said system comprising a processor, a user interface and a personality storage section, the user interface being configured to emulate the subject, by displaying a talking head which comprises the subject's face and output speech from the mouth of the face with the subject's voice, the user interface further comprising a receiver for receiving a query from the user, the emulated subject being configured to respond to the query received from the user, the processor comprising a dialogue section and a talking head generation section, wherein said dialogue section is configured to generate a response to a query inputted by a user from the user interface and generate a response to be outputted by the talking head, the response being generated by retrieving information from said personality storage section, said personality storage section comprising content created by or about the subject, and said talking head generation section is configured to: convert said response into a sequence of acoustic units, the talking head generation section further comprising a statistical model, said statistical model comprising a plurality of model parameters, said model parameters being derived from said personality storage section, the model parameters describing probability distributions which relate an acoustic unit to an image vector and speech vector, said image vector comprising a plurality of parameters which define the subject's face and said speech vector comprising a plurality of parameters which define the subject's voice, the talking head generation section being further configured to output a sequence of speech vectors and image vectors which are synchronised such that the head appears to talk.

A SYSTEM AND METHOD FOR MULTILINGUAL CONVERSION OF TEXT DATA TO SPEECH DATA

20210312899 · 2021-10-07 ·

The present invention provides a system and method for converting text data into speech data. Initially, the system enables a user to select a language from a plurality of languages supported by the operating system (OS) of a computing device. Further, on selecting and copying any text data, the system provides the user with options to listen to an audio output of the text data. The user is provided with options to listen to text data in either English or the selected language, when the language of the text data is one among the plurality of languages supported by the OS. Further, the user is provided with options to listen to text data in English, for the text data in any language. Once the user selects the option, the system converts the text data to speech data. The speech data is provided as the audio output to the user.

Generating videos with a character indicating a region of an image

11140459 · 2021-10-05 ·

VIDUBLY LTD

Methods, systems, and computer-readable media for generating videos with characters indicating regions of images are provided. For example, an image containing a first region may be received. At least one characteristic of a character may be obtained. A script containing a first segment of the script may be received. The first segment of the script may be related to the first region of the image. The at least one characteristic of a character and the script may be used to generate a video of the character presenting the script and at least part of the image, where the character visually indicates the first region of the image while presenting the first segment of the script.

Method and System for Parametric Speech Synthesis

20210256961 · 2021-08-19 ·

Embodiments of the present systems and methods may provide techniques for synthesizing speech in any voice in any language in any accent. For example, in an embodiment, a text-to-speech conversion system may comprise a text converter adapted to convert input text to at least one phoneme selected from a plurality of phonemes stored in memory, a machine-learning model storing voice patterns for a plurality of individuals and adapted to receive the at least one phoneme and an identity of a speaker and to generate acoustic features for each phoneme, and a decoder adapted to receive the generated acoustic features and to generate a speech signal simulating a voice of the identified speaker in a language.

Portable computing device having a color detection mode and a game mode for learning colors

11094219 · 2021-08-17 ·

International Business Machines Corporation

Cesar Augusto Rodriguez Bravo

A system and method for assisted-learning with a portable computing device that includes a color detection mode and a game mode.

Systems and methods for name pronunciation

11069336 · 2021-07-20 ·

Apple Inc.

Devang K. Naik

Patent classifications

G10L13/086