G10H2210/041

IDENTIFYING MUSIC ATTRIBUTES BASED ON AUDIO DATA
20230022947 · 2023-01-26 ·

The present disclosure describes techniques for identifying music attributes. The described techniques comprises receiving audio data of a piece of music; determining at least one attribute of the piece of music based on the audio data of the piece of music using a model; the model comprising a convolutional neural network and a transformer; the model being pre-trained using training data, wherein the training data comprise labelled data associated with a first plurality of music samples and unlabelled data associated with a second plurality of music samples, the labelled data comprise audio data of the first plurality of music samples and label information indicative of attributes of the first plurality of music samples, and the unlabelled data comprise audio data of the second plurality of music samples.

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

HARMONY-AWARE HUMAN MOTION SYNTHESIS WITH MUSIC
20230005201 · 2023-01-05 ·

A method and device for harmony-aware audio-driven motion synthesis are provided. The method includes determining a plurality of testing meter units according to an input audio, each testing meter unit corresponding to an input audio sequence of the input audio, obtaining an auditory input corresponding to each testing meter unit, obtaining an initial pose of each testing meter unit as a visual input based on a visual motion sequence synthesized for a previous testing meter unit, and automatically generating a harmony-aware motion sequence corresponding to the input audio using a generator of a generative adversarial network (GAN) model. The GAN model is trained by incorporating a hybrid loss function. The hybrid loss function includes a multi-space pose loss, a harmony loss, and a GAN loss. The harmony loss is determined according to beat consistencies of audio-visual beat pairs.

SYSTEMS AND METHODS FOR AN IMMERSIVE AUDIO EXPERIENCE
20220343923 · 2022-10-27 ·

A computer-implemented method for creating an immersive audio experience. The method includes receiving a selection of an audio track via a user interface, and receiving audio track metadata for the audio track. The method includes querying an audio database based on the track metadata and determining that audio data for the audio track is not stored on the audio database. The method includes analyzing the audio track to determine one or more audio track characteristics. The method includes generating vibe data based on the one or more audio track characteristics, wherein the vibe data includes time-coded metadata. Based on the vibe data, generating visualization instructions for one or A/V devices in communication with a user computing device, and transmitting the generated visualization instructions and the audio track to the user computing device.

MULTI-LEVEL AUDIO SEGMENTATION USING DEEP EMBEDDINGS

Embodiments are disclosed for generating an audio segmentation of an audio sequence using deep embeddings. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving an input including an audio sequence and extracting features for each frame of the audio sequence, where each frame is associated with a beat of the audio sequence. The method may further comprise clustering frames of the audio sequence into one or more clusters based on the extracted features and generating segments of the audio sequence based on the clustered frames, where each segment includes frames of the audio sequence from a same cluster. The method may further comprise constructing a multi-level audio segmentation of the audio sequence and performing a segment fusioning process that merges shorter segments with neighboring segments based on cluster assignments.

Digital audio workstation with audio processing recommendations
11687314 · 2023-06-27 · ·

Presentation of a recommendation to a user for individual processing of audio tracks in a digital audio workstation. Training audio tracks are provided to a human sound mixer and responsive to the training audio tracks individually processed training audio tracks are received from the human sound mixer. The training audio tracks and the individually processed training audio tracks are input to a machine to train the machine. Audio processing operations are output from the trained machine and stored in a record of a database.

Audio processing method and audio processing apparatus, and training method

Audio processing method and audio processing apparatus, and training method are described. According to embodiments of the application, an accent identifier is used to identify accent frames from a plurality of audio frames, resulting in an accent sequence comprised of probability scores of accent and/or non-accent decisions with respect to the plurality of audio frames. Then a tempo estimator is used to estimate a tempo sequence of the plurality of audio frames based on the accent sequence. The embodiments can be well adaptive to the change of tempo, and can be further used to tracking beats properly.

METHOD AND APPARATUS FOR MAKING MUSIC SELECTION BASED ON ACOUSTIC FEATURES
20170330540 · 2017-11-16 ·

A method of making audio music selection and creating a mixtape, comprising importing song files from a song repository; sorting and filtering the song files based on selection criteria; and creating the mixtape from the song files sorting and filtering results. The sorting and filtering of the song files comprise: spectral analyzing each of the song files to extract low level acoustic feature parameters of the song file; from the low level acoustic feature parameter values, determining the high level acoustic feature parameters of the analyzed song file; determining a similarity score of each of the analyzed song files by comparing the acoustic feature parameter values of the analyzed song file against desired acoustic feature parameter values determined from the selection criteria; and sorting the analyzed song files according to their similarity scores; and filtering out the analyzed song files with first similarity scores lower than a filter threshold.

Real-time speech to singing conversion

A method of converting a frame of a voice sample to a singing frame includes obtaining a pitch value of the frame; obtaining formant information of the frame using the pitch value; obtaining aperiodicity information of the frame using the pitch value; obtaining a tonic pitch and chord pitches; using the formant information, the aperiodicity information, the tonic pitch, and the chord pitches to obtain the singing frame; and outputting or saving the singing frame.

METHOD, APPARATUS AND SYSTEM
20170301354 · 2017-10-19 · ·

A method including decomposing a magnitude part of a signal spectrum of a mixture signal into spectral components, each spectral component including a frequency part and a time activation part; and clustering the spectral components to obtain one or more clusters of spectral components, wherein the clustering of the spectral components is computed in the time domain.