Patent classifications
G10H2250/455
INFORMATION PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND PROGRAM
An information processing method is realized by a computer and includes generating a first characteristic transition which is a transition of acoustic characteristics, in accordance with an instruction from a user, generating a second characteristic transition which is a transition of acoustic characteristics of voice that is pronounced in a specific pronunciation style selected from a plurality of pronunciation styles, and generating a combined characteristic transition which is a transition of the acoustic characteristics of synthesized voice by combining the first characteristic transition and the second characteristic transition.
Method and apparatus for presenting media information, storage medium, and electronic apparatus
The present disclosure describes embodiments of a method, a device, and a non-transitory computer readable storage medium for presenting media information. The method includes displaying, by a device, an interaction interface. The device includes a memory storing instructions and a processor in communication with the memory. The method includes obtaining, by the device, an image set through the interaction interface, the image set comprising at least one image. The method includes obtaining, by the device, target media based on the image set through the interaction interface, the target media comprising a first audio generated according to an image feature of the image set. The method includes presenting, by the device, the target media.
Music synthesis method, system, terminal and computer-readable storage medium
A music synthesis method, a system, a terminal and a computer-readable storage medium are provided. The method includes: receiving a track selected by a user; obtaining a text; receiving speech data recorded by the user on the basis of the text; and forming a music file in accordance with the selected track and the speech data. The speech of a user can be combined with the track through the music synthesis method of the present application and an optimal effect of music can be simulated such that the user can participate in the singing and presentation of a music, thereby making music more entertaining.
Singing voice separation with deep U-Net convolutional networks
A system, method and computer product for estimating a component of a provided audio signal. The method comprises converting the provided audio signal to an image, processing the image with a neural network trained to estimate one of vocal content and instrumental content, and storing a spectral mask output from the neural network as a result of the image being processed by the neural network. The neural network is a U-Net. The method also comprises providing the spectral mask to a client media playback device, which applies the spectral mask to a spectrogram of the provided audio signal, to provide a masked spectrogram. The media playback device also transforms the masked spectrogram to an audio signal, and plays back that audio signal via an output user interface.
SYSTEMS AND METHODS FOR TRANSPOSING SPOKEN OR TEXTUAL INPUT TO MUSIC
Described herein are real-time musical translation devices (RETM) and methods of use thereof. Exemplary uses of RETMs include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.
RESPONSIVE LIVE MUSICAL SOUND GENERATION
Predetermined musical data for a song is received. The predetermined musical data includes chords and lyrics and rhythmic structures of the song. Audio data of a band generating music of the song is received. Generating real-time vocal audio that is in rhythm with the audio data and in harmony with the chords. The vocal audio includes the lyrics and is of a predetermined voice.
Electronic musical instrument, electronic musical instrument control method, and storage medium
An electronic musical instrument includes an operation unit that receives a user performance; and at least one processor. wherein the at least one processor performs the following: in accordance with a user operation specifying a chord on the operation unit, obtaining lyric data of a lyric and obtaining a plurality of pieces of waveform data respectively corresponding to a plurality of pitches indicated by the specified chord; inputting the obtained lyric data to a trained model that has been trained and learned singing voices of a singer so as to cause the trained model to output acoustic feature data in response thereto; synthesizing each of the plurality of pieces of waveform data with the acoustic feature data so as to generate a plurality of pieces of synthesized waveform data; and outputting a polyphonic synthesized singing voice based on the generated plurality of pieces of synthesized waveform data.
METHOD AND APPARATUS FOR RENDERING LYRICS
A method for rendering lyrics is provided, including: acquiring pronunciation of a polyphonic word to be rendered in target lyrics, and acquiring playback time information of the pronunciation in the process of rendering the target lyrics; determining a first number of furiganas contained in the pronunciation; and word-by-word rendering, according to the first number and the playback time information of the pronunciation of the polysyllabic word to be rendered, the polysyllabic word to be rendered and each furigana in the pronunciation of the polysyllabic word to be rendered simultaneously, wherein the pronunciation of the polysyllabic word to be rendered is adjacent to and parallel to the polysyllabic word to be rendered.
ELECTRONIC MUSICAL INSTRUMENT, ELECTRONIC MUSICAL INSTRUMENT CONTROL METHOD, AND STORAGE MEDIUM
An electronic musical instrument includes at least one processor that, in accordance with a user operation on an operation unit, obtains lyric data and waveform data corresponding to a first tone color; inputs the obtained lyric data to a trained model so as to cause the trained model to output acoustic feature data in response thereto; generates waveform data corresponding to a singing voice of a singer and corresponding to a second tone color that is different from the first tone color, based on the acoustic feature data outputted from the trained model and the obtained waveform data corresponding to the first tone color; and outputs a singing voice based on the generated waveform data corresponding to the second tone color.
AUTOMATIC TRANSLATION USING DEEP LEARNING
Audio data of an original work is received. Text in the audio data is translated to a target language. The audio data is passed to a first deep learning model to learn voice features in the audio data. The audio data is passed to a second deep learning model to learn audio properties in the audio data. The translated text is synchronized to play in the position of original text of the original work in a synthesized voice. A translated audio data of the original work is created by combining the synchronized translated text in the synthesized voice with music of the audio data.