Patent classifications
G10H2210/086
Context-dependent piano music transcription with convolutional sparse coding
The present disclosure presents a novel approach to automatic transcription of piano music in a context-dependent setting. Embodiments described herein may employ an efficient algorithm for convolutional sparse coding to approximate a music waveform as a summation of piano note waveforms convolved with associated temporal activations. The piano note waveforms may be pre-recorded for a particular piano that is to be transcribed and may optionally be pre-recorded in the specific environment where the piano performance is to be performed. During transcription, the note waveforms may be fixed and associated temporal activations may be estimated and post-processed to obtain the pitch and onset transcription. Experiments have shown that embodiments of the disclosure significantly outperform state-of-the-art music transcription methods trained in the same context-dependent setting, in both transcription accuracy and time precision, in various scenarios including synthetic, anechoic, noisy, and reverberant environments.
METHOD FOR GENERATING MUSICAL SCORE, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
A method for generating a musical score, a device, and a computer-readable storage medium. The method includes: obtaining target audio; generating a chromagram of the target audio corresponding to each pitch class, utilizing the chromagram to identify a chord of the target audio, and obtaining chord information; performing mode detection on the target audio, and obtaining original key information; performing rhythm detection on the target audio and obtaining the beats per minute; performing identification on a beat type of each audio frame of the target audio, and determining an audio time signature on the basis of a correspondence relationship between a beat type and a time signature; utilizing the chord information, the original key information, the beats per minute, and the audio time signature and performing musical score rendering, and obtaining a target musical score.
Mapping characteristics of music into a visual display
A method and system for visualizing music using a perceptually conformal mapping system are provided. A music source file is input into a processor configured to carry out a series of steps on audio cues identified within the music and ultimately generate a simultaneous visual representation on a display device. The series of steps include application of one or more perceptually conformal mapping systems that essentially induce a synesthetic experience in which a person can experience music both acoustically and visually at the same time. The device extracts cues from the music that are designed to specifically capture fundamentals of human appreciation, maps them into visual cues, then presents those visual cues synchronized with the source music.
Spiral curve type music sheet, apparatus and method for providing spiral curve type music sheet
Disclosed are a spiral curve type music sheet in which different notes are displayed at different positions on a spiral curve based on the pitches of notes, and an apparatus and method for providing a spiral curve type music sheet. The apparatus for providing a spiral curve type music sheet may include a memory configured to store a spiral curve type music sheet in which different notes are displayed at different positions on a spiral curve based on the pitch of the note and note data, and a processor configured to determine the note symbol position related to the note data on the spiral curve in the spiral curve type music sheet based on the frequency of the note data.
MUSICAL ANALYSIS PLATFORM
A platform or system is disclosed for performing musical analysis to detect musical properties in received live or pre-recorded audio data. The analysis can include a synchronous analysis for generating estimated one or more transitory musical properties and an asynchronous analysis for generating one or more aggregate musical properties which can be applied to the transitory musical properties to generate confirmed musical properties, which can be stored as metadata associated with an audio file. In some cases, live audio data can be received, recorded, dynamically analyzed to provide realtime metadata (e.g., to a display), then the realtime metadata can be analyzed to provide confirmed, updated, or validated metadata. In some cases, initial analysis (e.g., dynamic analysis) can determine chord estimates, usable in further analysis (e.g., offline analysis) to estimate a musical key, which can then be applied to the chord estimates to determine the most likely chord estimates and determine chord progressions.
MUSICAL ANALYSIS PLATFORM
A platform or system is disclosed for performing musical analysis to detect musical properties in received live or pre-recorded audio data. The analysis can include a synchronous analysis for generating estimated one or more transitory musical properties and an asynchronous analysis for generating one or more aggregate musical properties which can be applied to the transitory musical properties to generate confirmed musical properties, which can be stored as metadata associated with an audio file. In some cases, live audio data can be received, recorded, dynamically analyzed to provide realtime metadata (e.g., to a display), then the realtime metadata can be analyzed to provide confirmed, updated, or validated metadata. In some cases, initial analysis (e.g., dynamic analysis) can determine chord estimates, usable in further analysis (e.g., offline analysis) to estimate a musical key, which can then be applied to the chord estimates to determine the most likely chord estimates and determine chord progressions.
Audio signal processing methods and systems
Described are methods and systems of identifying one or more fundamental frequency component(s) of an audio signal. The methods and systems may include any one or more of an audio event receiving step, a signal discretization step, a masking step, and/or a transcription step.
SYSTEM AND METHOD FOR UNALIGNED SUPERVISION FOR AUTOMATIC MUSIC TRANSCRIPTION
Disclosed herein is a system that includes a memory storing computer-readable instructions and at least one processor to execute the instructions to perform pre-training of a machine learning model using synthetic data including random music instrument digital interface (MIDI) files, receive a first library of audio files, receive a second library of MIDI files, each MIDI file having a corresponding audio file in the first library, align each midi file in the second library with the corresponding audio file in the first library, feed the first library and the second library into the machine learning model to train the machine learning model to perform musical transcribing of at least one musical instrument in an audio file, and receive an audio file and perform automatic transcription of at least one musical instrument in an audio file using the machine learning model based on expectation maximization.
MAPPING CHARACTERISTICS OF MUSIC INTO A VISUAL DISPLAY
A method and system for visualizing music using a perceptually conformal mapping system are provided. A music source file is input into a processor configured to carry out a series of steps on audio cues identified within the music and ultimately generate a simultaneous visual representation on a display device. The series of steps include application of one or more perceptually conformal mapping systems that essentially induce a synesthetic experience in which a person can experience music both acoustically and visually at the same time. The device extracts cues from the music that are designed to specifically capture fundamentals of human appreciation, maps them into visual cues, then presents those visual cues synchronized with the source music.
Beatboxing transcription
Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.