G10H1/361

Method and apparatus for generating digital score file of song, and storage medium

A method and an information processing apparatus to generate a digital score file of a song are described. The information processing apparatus includes processing circuitry. The processing circuitry is configured to obtain a candidate audio file satisfying a first condition from audio files of unaccompanied singing of the song without instrumental accompaniment. The processing circuitry is configured to divide the candidate audio file into valid audio segments based on timing information of the song, and extract pieces of music note information from the valid audio segments. Each of the pieces of music note information includes at least one data set of a music note in the song. The data set includes an onset time, a duration, and a music note value of the music note. The processing circuitry is configured to generate the digital score file based on the pieces of music note information.

Systems and methods for transferring musical drum samples from slow memory to fast memory
10923088 · 2021-02-16 · ·

An electronic-drum module for connection to one or more electronic-drum pads is provided. The module includes an electronic display; a first memory storing audio files for playback when the playback is triggered by a signal received from a pad; and one or more processors coupled to the display and the memory. The processors are configured receive an instruction to transfer a set of samples. The set of samples is associated with a priority-instruction and includes a first subset of samples and a second subset of samples. The processors are also configures to transfer the first subset of samples from a second memory to the first memory based on the priority-instruction before transferring the second subset of samples and to transfer the second subset of samples from the second memory to the first memory.

Method and apparatus for correcting delay between accompaniment audio and unaccompanied audio, and storage medium

A method and apparatus for correcting a delay between accompaniment audio and unaccompanied audio, and a storage medium are provided. The method includes: acquiring original audio of a target song, and extracting original vocal audio from the original audio; determining a first delay between the original vocal audio and the unaccompanied audio, and determining a second delay between the accompaniment audio and the original audio; and correcting a delay between the accompaniment audio and the unaccompanied audio based on the first delay and the second delay. Thus, the correction efficiency of the delay between accompaniment audio and unaccompanied audio is improved, and correction mistakes possibly caused by human factors are eliminated, thereby improving the accuracy.

Non-linear media segment capture and edit platform

User interface techniques provide user vocalists with mechanisms for forward and backward traversal of audiovisual content, including pitch cues, waveform- or envelope-type performance timelines, lyrics and/or other temporally-synchronized content at record-time, during edits, and/or in playback. Recapture of selected performance portions, coordination of group parts, and overdubbing may all be facilitated. Direct scrolling to arbitrary points in the performance timeline, lyrics, pitch cues and other temporally-synchronized content allows user to conveniently move through a capture or audiovisual edit session. In some cases, a user vocalist may be guided through the performance timeline, lyrics, pitch cues and other temporally-synchronized content in correspondence with group part information such as in a guided short-form capture for a duet. A scrubber allows user vocalists to conveniently move forward and backward through the temporally-synchronized content.

Singing voice separation with deep u-net convolutional networks

A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.

Singing voice separation with deep U-Net convolutional networks

A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.

SHORT SEGMENT GENERATION FOR USER ENGAGEMENT IN VOCAL CAPTURE APPLICATIONS

User interface techniques provide user vocalists with mechanisms for solo audiovisual capture and for seeding subsequent performances by other users (e.g., joiners). Audiovisual capture may be against a full-length work or seed spanning much or all of a pre-existing audio (or audiovisual) work and in some cases may mix, to seed further contributions of one or more joiners, a user's captured media content for at least some portions of the audio (or audiovisual) work. A short seed or short segment may span less than all (and in some cases, much less than all) of the audio (or audiovisual) work. For example, a verse, chorus, refrain, hook or other limited chunk of an audio (or audiovisual) work may constitute a short seed or short segment. Computational techniques are described that allow a system to automatically identify suitable short seeds or short segments. After audiovisual capture against the short seed or short segment, a resulting, solo or group, full-length or short-form performance may be posted, livestreamed, or otherwise disseminated in a social network.

Harmony generation device and storage medium

A harmony generation device and a program for the same which can generate a natural harmony sound are provided. The harmony generation device (1) generates first and second harmony tones to which a voice input through a microphone (M) is shifted in pitch by first and second shift amounts calculated based on both the voice input through the microphone (M) and a chord determined from performance information of an electric guitar (G) input through an input device (34). That is, since the first and second harmony tones can be tones based on the chord of the electric guitar (G) that changes from moment to moment, the harmony sound obtained by mixing the first and second harmony tones with the voice input through the microphone (M) can be a natural harmony sound that is rich in variation according to the chord of the electric guitar (G).

LOOP SWITCHER, CONTROLLERS THEREFOR AND METHODS FOR CONTROLLING AN ARRAY OF AUDIO EFFECT DEVICES
20210049991 · 2021-02-18 ·

The present invention relates to a loop switcher, controllers therefor, and methods for controlling an array of audio effect devices. A controller for controlling a loop switcher with an input, an output, and a plurality of re-orderable loops, comprising a plurality of switches, each of the plurality of switches controls a corresponding one of the plurality of re-orderable loops for coupling the corresponding one of the plurality of re-orderable loops between the input and the output in a sequence; and a plurality of display elements, wherein each of the plurality of switches has a corresponding one of the plurality of display elements, where each display element indicates visually the sequence order of the sequence in which the plurality of re-orderable loops are coupled between the input and the output.

TECHNIQUES FOR LEARNING EFFECTIVE MUSICAL FEATURES FOR GENERATIVE AND RETRIEVAL-BASED APPLICATIONS
20210049989 · 2021-02-18 ·

A method includes receiving a non-linguistic input associated with an input musical content. The method also includes, using a model that embeds multiple musical features describing different musical content and relationships between the different musical content in a latent space, identifying one or more embeddings based on the input musical content. The method further includes at least one of: (i) identifying stored musical content based on the one or more identified embeddings or (ii) generating derived musical content based on the one or more identified embeddings. In addition, the method includes presenting at least one of: the stored musical content or the derived musical content. The model is generated by training a machine learning system having one or more first neural network components and one or more second neural network components such that embeddings of the musical features in the latent space have a predefined distribution.