G10H2250/455

Method and apparatus for rendering lyrics

A method for rendering lyrics is provided, including: acquiring pronunciation of a polyphonic word to be rendered in target lyrics, and acquiring playback time information of the pronunciation in the process of rendering the target lyrics; determining a first number of furiganas contained in the pronunciation; and word-by-word simultaneously rendering, according to the first number and the playback time information of the pronunciation of the polyphonic word to be rendered, the polyphonic word to be rendered and each furigana in the pronunciation of the polyphonic word to be rendered, wherein the pronunciation of the polyphonic word to be rendered is adjacent to and parallel to the polyphonic word to be rendered.

Electronic musical instrument, electronic musical instrument control method, and storage medium

An electronic musical instrument includes: a memory that stores lyric data including lyrics for a plurality of timings, pitch data including pitches for said plurality of timings, and a trained model that has been trained and learned singing voice features of a singer; and at least one processor, wherein at each of said plurality of timings, the at least one processor: if the operation unit is not operated, obtains, from the trained model, a singing voice feature associated with a lyric indicated by the lyric data and a pitch indicated by the pitch data; if the operation unit is operated, obtains, from the trained model, a singing voice feature associated with the lyric indicated by the lyric data and a pitch indicated by the operation of the operation unit; and synthesizes and outputs singing voice data based on the obtained singing voice feature of the singer.

Audio translator
11605369 · 2023-03-14 · ·

Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.

APPARATUS AND METHOD FOR A PHONATION SYSTEM
20230145714 · 2023-05-11 ·

A system and method for presenting a phonation system game that includes a graphical animation and a phonation system song to a child using an electronic screen-based device is provided. One embodiment detects sounds of the singing child; identifies a song word sung by the child; identifies a song word presented by the phonation system song, wherein the song word presented by the phonation system is the same as the song word sung by the child; identifies an attribute of interest in the song word sung by the child; retrieves a predefined song word attribute associated with the song word presented by the phonation system from a coded event database, wherein the predefined song word attribute is associated with the song word sung by the child; and compares the predefined song word attribute with the identified attribute of interest in the song word sung by the child.

Analyzing changes in vocal power within music content using frequency spectrums

Technologies are described for identifying familiar or interesting parts of music content by analyzing changes in vocal power using frequency spectrums. For example, a frequency spectrum can be generated from digitized audio. Using the frequency spectrum, the harmonic content and percussive content can be separated. The vocal content can then be separated from the harmonic and/or percussive content. The vocal content can then be processed to identify surge points in the digitized audio. In some implementations, the vocal content is included in the harmonic content during the separation procedure and is then separated from the harmonic content.

SYSTEMS AND METHODS FOR TRANSPOSING SPOKEN OR TEXTUAL INPUT TO MUSIC
20230197058 · 2023-06-22 ·

Described herein are real-time musical translation devices (RETM) and methods of use thereof. Exemplary uses of RETMs include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.

Method and system for interactive song generation

A method and system may provide for interactive song generation. In one aspect, a computer system may present options for selecting a background track. The computer system may generate suggested lyrics based on parameters entered by the user. User interface elements allow the computer system to receive input of lyrics. As the user inputs lyrics, the computer system may update its suggestions of lyrics based on the previously input lyrics. In addition, the computer system may generate proposed melodies to go with the lyrics and the background track. The user may select from among the melodies created for each portion of lyrics. The computer system may optionally generate a computer-synthesized vocal(s) or capture a vocal track of a human voice singing the song. The background track, lyrics, melodies, and vocals may be combined to produce a complete song without requiring musical training or experience by the user.

Real-time speech to singing conversion

A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.

Unsupervised singing voice conversion with pitch adversarial network
11257480 · 2022-02-22 · ·

A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.

Learning singing from speech
11430431 · 2022-08-30 · ·

A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.