Patent classifications
G10H2250/455
AUDIO TRANSLATOR
Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.
SYSTEMS AND METHODS FOR TRANSPOSING SPOKEN OR TEXTUAL INPUT TO MUSIC
Described herein are real-time musical translation devices (RETMs) and methods of use thereof. Exemplary uses of RETMs include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.
LEARNING SINGING FROM SPEECH
A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
SYSTEMS AND METHODS FOR TRANSFORMING AUDIO IN CONTENT ITEMS
Systems, methods, and non-transitory computer-readable media can be configured to obtain source audio based on recorded audio. A tuned audio transform can be generated based on a source audio transform corresponding to the source audio and a recorded audio transform corresponding to the recorded audio. Tuned audio can be generated based on the tuned audio transform.
Voice synthesis method, voice synthesis device, and storage medium
A voice synthesis method according to an embodiment includes altering a series of synthesis spectra in a partial period of a synthesis voice based on a series of amplitude spectrum envelope contours of a voice expression to obtain a series of altered spectra to which the voice expression has been imparted, and synthesizing a series of voice samples to which the voice expression has been imparted, based on the series of altered spectra.
SONG GENERATION BASED ON A TEXT INPUT
The disclosure provides a method and an apparatus for song generation. A text input may be received. A topic and an emotion may be extracted from the text input. A melody may be determined according to the topic and the emotion. Lyrics may be generated according to the melody and the text input. A song may be generated at least according to the melody and the lyrics.
Keyboard instrument and method performed by computer of keyboard instrument
A keyboard instrument includes at least one processor that determines a first pattern of intonation to be applied to a first time segment of a voice data on the basis of a first user operation on a first operation element, causes a first singing voice for the first time segment to be digitally synthesized from the first segment data in accordance with the determined first pattern of intonation, determines a second pattern of intonation to be applied to the second time segment of the voice data on the basis of a second user operation on a second operation element, and causes a second singing voice for the second time segment to be digitally synthesized from the second segment data in accordance with the determined second pattern of intonation.
Real-Time Speech To Singing Conversion
A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.
INFORMATION PROCESSING DEVICE, ELECTRONIC MUSICAL INSTRUMENT, AND INFORMATION PROCESSING METHOD
A voice synthesis device includes at least one processor, implementing a first voice model and a second voice model different from the first voice model, the at least one processor performing the following: receiving data indicating a specified pitch; and causing the first voice model to output a first data and the second voice model to output a second data, and generating and outputting a third data corresponding to the specified pitch based on the first data and second data.
AUDIO TRANSLATOR
Audio translation system includes a feature extractor and a style transfer machine learning model. The feature extractor generates for each of a plurality of source voice files one or more source voice parameters encoded as a collection of source feature vectors, and generates for each of a plurality of target voice files one or more target voice parameters encoded as a collection of target feature vectors. The style transfer machine learning model trained on the collection of source feature vectors for the plurality of source voice files and the collection of target feature vectors for the plurality of target voice files to generate a style transformed feature vector.