Patent classifications
G10H2210/061
SOUND SOURCE FILE STRUCTURE, RECORDING MEDIUM RECORDING THE SAME, AND METHOD OF PRODUCING SOUND SOURCE FILE
The present disclosure relates to a sound source file structure, to output lyrics as audible sounds right before melodies corresponding to the lyrics start, to help a user to remind the lyrics based on accompaniment for a song after the accompaniment starts to be provided, and to help the user to sing based on correct lyrics corresponding to the melodies. The sound source file structure may include one or more backing sound source layers in which backing sounds based on beats and rhythms are placed, a melody sound source layer in which melody notes corresponding to lyrics based on beats and rhythms and a rest section corresponding to a rest are placed, and a lyric voice source layer in which a lyric voice is placed at a position corresponding to a rest section.
MUSIC COVER IDENTIFICATION WITH LYRICS FOR SEARCH, COMPLIANCE, AND LICENSING
Embodiments cover identifying an unidentified media content item as a cover of a known media content item using lyrical contents. In an example, a processing device receives an unidentified media content item and determines lyrical content associated with the unidentified media content item. The processing device then determines a lyrical similarity between the lyrical content associated with the unidentified media content item and additional lyrical content associated with a known media content item of a plurality of known media content items. The processing device then identifies the unidentified media content item as a cover of the known media content item based at least in part on the lyrical similarity, resulting in an identified cover-media content item.
SYSTEMS AND METHODS FOR GENERATING AUDIBLE VERSIONS OF TEXT SENTENCES FROM AUDIO SNIPPETS
A method is performed at a server system of a media-providing service. The server system has one or more processors and memory storing instructions for execution by the one or more processors. The method includes receiving a text sentence including a plurality of words from a device of a first user and extracting a plurality of audio snippets from one or more audio tracks. A respective audio snippet in the plurality of audio snippets corresponds to one or more words in the plurality of words of the text sentence. The method also includes assembling the plurality of audio snippets in a first order to produce an audible version of the text sentence. The method further includes providing, for playback at the device of the first user, the audible version of the text sentence including the plurality of audio snippets in the first order.
METHOD AND SYSTEM FOR PROCESSING AUDIO STEMS
A method and system for processing an audio stem/loop, including dividing a stem into a plurality of stem slices, classifying each of the plurality of stem slices into at least a first group or a second group and applying a stem effect, that includes replacing at least one stem slice with an all-zero stem slice, replacing at least one stem slice belonging to the first group or the second group with a stem slice belonging to the first group or the second group.
Music technique responsible for versioning
Systems and methods for versioning audio elements used in generation of music are provided. An example method includes receiving musical format data associated with a plurality of audio elements of a melody; determining, based on the musical format data, harmonic and melodic characteristics of each of the plurality of audio elements; matching the harmonic and melodic characteristics to a plurality of chord progressions using predetermined music theory rules, counterpoint rules, and rhythm matching rules; deriving, based on the matching and predetermined melodic movement rules, from the plurality of chord progressions, melodic movement characteristics applicable to using in versioning; and creating, based on the predetermined music theory rules and the melodic movement characteristics, versions of the audio elements that match the chord progressions.
METHOD AND SYSTEM FOR LEARNING AND USING LATENT-SPACE REPRESENTATIONS OF AUDIO SIGNALS FOR AUDIO CONTENT-BASED RETRIEVAL
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
System and method for generation of musical notation from audio signal
A system for generation of a musical notation from an audio signal, the system comprising at least one processor configured to: obtain the audio signal from an audio source or a data repository; process the audio signal using first machine learning (ML) model(s) to generate a recognition result, wherein the recognition result is indicative of a pitch and a duration of a plurality of notes in the audio signal and their corresponding confidence scores; generate a preliminary musical notation using the recognition result; process the preliminary musical notation using second ML model(s) to determine whether the preliminary musical notation includes one or more errors; and when it is determined that the preliminary musical notation includes one or more errors, modify the preliminary musical notation to generate the musical notation that is error-free or has lesser errors as compared to the preliminary musical notation.
Methods and apparatus to segment audio and determine audio segment similarities
Methods, apparatus, and systems are disclosed to segment audio and determine audio segment similarities. An example apparatus includes at least one memory storing instructions and processor circuitry to execute instructions to at least select an anchor index beat of digital audio, identify a first segment of the digital audio based on the anchor index beat to analyze, the first segment having at least two beats and a respective center beat, concatenate time-frequency data of the at least two beats and the respective center beat to form a matrix of the first segment, generate a first deep feature based on the first segment, the first deep feature indicative of a descriptor of the digital audio, and train internal coefficients to classify the first deep feature as similar to a second deep feature based on the descriptor of the first deep feature and a descriptor of a second deep feature.
SYSTEMS AND METHODS FOR GENERATING A MIXED AUDIO FILE IN A DIGITAL AUDIO WORKSTATION
An electronic device receives a source audio file from a user of a digital audio workstation and a target MIDI file, the target MIDI file comprising digital representations for a series of notes. The electronic device generates a series of sounds from the target MIDI file, each respective sound in the series of sounds corresponding to a respective note in the series of notes. The electronic device divides the source audio file into a plurality of segments. For each sound in the series of sounds, the electronic device matches a segment from the plurality of segments to the sound based on a weighted combination of features identified for the corresponding sound. The electronic device generates an audio file in which the series of sounds from the target MIDI file are replaced with the matched segment corresponding to each sound.
METHOD FOR IDENTIFYING A SONG
A computer-implemented method for identifying a song includes: providing audio data including musical notation information for songs, receiving a real-time audio signal of a user performing on an instrument, detecting playing activity in successive segments, detecting notes and/or chords from the audio signal, storing user play history information including of information of songs a user has played before and number of plays, based on the play history information calculating a first probability for a song, based on first probabilities for a number of songs and based on the detected playing activity and the detected notes and/or chords, estimating the song being performed. The estimation includes calculating a second probability for different songs. The second probabilities are defined by the audio signal corresponding with a particular song of the play history combined with first probability associated with the song, and providing the song the user is performing or related information.