G10H2240/141

Techniques for learning effective musical features for generative and retrieval-based applications

A method includes receiving a non-linguistic input associated with an input musical content. The method also includes, using a model that embeds multiple musical features describing different musical content and relationships between the different musical content in a latent space, identifying one or more embeddings based on the input musical content. The method further includes at least one of: (i) identifying stored musical content based on the one or more identified embeddings or (ii) generating derived musical content based on the one or more identified embeddings. In addition, the method includes presenting at least one of: the stored musical content or the derived musical content. The model is generated by training a machine learning system having one or more first neural network components and one or more second neural network components such that embeddings of the musical features in the latent space have a predefined distribution.

Method and system for accelerated decomposing of audio data using intermediate data

A method for processing audio data, comprising providing song identification data identifying a particular song from among a plurality of songs or identifying a particular position within a particular song, loading intermediate data associated with the song identification data from a storage medium or from a remote device. The method also comprises obtaining input audio data representing audio signals of the song as identified by the song identification data. The audio signals comprise a mixture of different musical timbres, including at least a first musical timbre and a second musical timbre different from said first musical timbre. The method comprises combining the input audio data and the intermediate data with one another to obtain output audio data. The audio data represent audio signals of the first musical timbre separated from the second musical timbre.

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

Assigning audible alerts among co-located appliances
11334749 · 2022-05-17 ·

Smart appliances communicate over a local area network. A selected sound palette includes multiple soundfile groups that each include multiple soundfiles. Each soundfile within a given sound palette has a first common sound attribute, each soundfile within a given soundfile group has a second common sound attribute, each soundfile within a given soundfile group has a variation of the second common sound attribute that is unique among the soundfile groups, and each soundfile within a given soundfile group has a unique variation of a third common sound attribute. One soundfile group may be assigned to each of the appliances, and one soundfile within each soundfile group may be assigned to an alert type. The assigned soundfile group is sent to the appliance with identification of the alert type to which each soundfile is assigned, such that the appliance may use an assigned soundfile for audible alerts.

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

Method for making music recommendations and related computing device, and medium thereof

This application discloses a method for making music recommendations. The method for making music recommendations is performed by a server device. The method includes obtaining a material for which background music is to be added; determining at least one visual semantic tag of the material, the at least one visual semantic tag describing at least one characteristic of the material; identifying a matched music matching the at least one visual semantic tag from a candidate music library; sorting the matched music according to user assessing information of a user corresponding to the material; screening the matched music based on a sorting result and according to a preset music screening condition; and recommending matched music obtained through the screening as candidate music of the material.

Methods and Apparatus to Segment Audio and Determine Audio Segment Similarities
20230245645 · 2023-08-03 ·

Methods, apparatus, and systems are disclosed to segment audio and determine audio segment similarities. An example apparatus includes at least one memory storing instructions and processor circuitry to execute instructions to at least select an anchor index beat of digital audio, identify a first segment of the digital audio based on the anchor index beat to analyze, the first segment having at least two beats and a respective center beat, concatenate time-frequency data of the at least two beats and the respective center beat to form a matrix of the first segment, generate a first deep feature based on the first segment, the first deep feature indicative of a descriptor of the digital audio, and train internal coefficients to classify the first deep feature as similar to a second deep feature based on the descriptor of the first deep feature and a descriptor of a second deep feature.

Method and system for AI controlled loop based song construction

According to an embodiment, there is provided a system and method for automatic AI controlled loop based song construction. It provides and benefits from a machine learning AI in a audio loop selection engine for the generation of a song structure and for the selection of fitting audio loops from a database of audio loops. In one embodiment, the instant method provides a music generation process that utilizes an AI system that has been trained and validated on a music item database to complete the creation of a music item given an incomplete song that was started but not finished by a user.

MUSIC COVER IDENTIFICATION WITH LYRICS FOR SEARCH, COMPLIANCE, AND LICENSING
20210357451 · 2021-11-18 ·

Embodiments cover identifying an unidentified media content item as a cover of a known media content item using lyrical contents. In an example, a processing device receives an unidentified media content item and determines lyrical content associated with the unidentified media content item. The processing device then determines a lyrical similarity between the lyrical content associated with the unidentified media content item and additional lyrical content associated with a known media content item of a plurality of known media content items. The processing device then identifies the unidentified media content item as a cover of the known media content item based at least in part on the lyrical similarity, resulting in an identified cover-media content item.

METHOD AND SYSTEM FOR PROCESSING AUDIO STEMS
20210350778 · 2021-11-11 ·

A method and system for processing an audio stem/loop, including dividing a stem into a plurality of stem slices, classifying each of the plurality of stem slices into at least a first group or a second group and applying a stem effect, that includes replacing at least one stem slice with an all-zero stem slice, replacing at least one stem slice belonging to the first group or the second group with a stem slice belonging to the first group or the second group.