Patent classifications
G10H2250/145
Context-dependent piano music transcription with convolutional sparse coding
The present disclosure presents a novel approach to automatic transcription of piano music in a context-dependent setting. Embodiments described herein may employ an efficient algorithm for convolutional sparse coding to approximate a music waveform as a summation of piano note waveforms convolved with associated temporal activations. The piano note waveforms may be pre-recorded for a particular piano that is to be transcribed and may optionally be pre-recorded in the specific environment where the piano performance is to be performed. During transcription, the note waveforms may be fixed and associated temporal activations may be estimated and post-processed to obtain the pitch and onset transcription. Experiments have shown that embodiments of the disclosure significantly outperform state-of-the-art music transcription methods trained in the same context-dependent setting, in both transcription accuracy and time precision, in various scenarios including synthetic, anechoic, noisy, and reverberant environments.
AUDIO GENERATION USING GENERATIVE ARTIFICIAL INTELLIGENCE MODEL
A method. The method including receiving a prompt describing desired characteristics of audio. The method further including generating, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz. The method further including generating, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate greater than the latent rate. The audio file including the audio based on the latent space representation of the audio. The audio having a length greater than 90 seconds.
Audio generation using generative artificial intelligence model
A method. The method including receiving a prompt describing desired characteristics of audio. The method further including generating, using a set of machine learning models and based on the prompt, a latent space representation of the audio at a latent rate less than 40 Hz. The method further including generating, using the set of machine learning models and the latent space representation of the audio, an audio file at an output rate greater than the latent rate. The audio file including the audio based on the latent space representation of the audio. The audio having a length greater than 90 seconds.