Patent classifications
G10H2250/215
MODELING OF THE LATENT EMBEDDING OF MUSIC USING DEEP NEURAL NETWORK
Methods and systems are provided for detecting and cataloging qualities in music. While both the data volume and heterogeneity of the digital music content is huge, it has become increasingly important and convenient to build a recommendation or search system to facilitate surfacing these content to the user or consumer community. Embodiments use deep convolutional neural network to imitate how human brain processes hierarchical structures in the auditory signals, such as music, speech, etc., at various timescales. This approach can be used to discover the latent factor models of the music based upon acoustic hyper-images that are extracted from the raw audio waves of music. These latent embeddings can be used either as features to feed to subsequent models, such as collaborative filtering, or to build similarity metrics between songs, or to classify music based on the labels for training such as genre, mood, sentiment, etc.
Systems, devices, and methods for assigning mood labels to musical compositions
Computer-based systems, devices, and methods for assigning mood labels to musical compositions are described. A mood classifier is trained based on mood-labeled musically-coherent segments of musical compositions and subsequently applied to automatically assign mood labels to musically-coherent segments of musical compositions. In both cases, the musically-coherent segments are generated using automated segmentation algorithms.
Systems and methods for audio based synchronization using sound harmonics
Multiple audio files may be synchronized using harmonic sound included in audio content obtained from audio tracks. Individual audio tracks are partitioned into multiple temporal windows of a first and second temporal window length. Individual audio waveforms for individual temporal windows of the first and second window length are transformed into frequency space in which energy is represented as a function of frequency. Individual pitches and magnitudes of harmonic sound determined for individual temporal windows may be compared using a multi-resolution framework to correlate pitches and harmonic energy of multiple audio tracks to one another.
CHARACTERIZING AUDIO USING TRANSCHROMAGRAMS
Methods, systems and apparatus to characterize audio using transchromagrams are disclosed. An example method includes generating, by executing one or more instructions on a processor, a set of transition matrices based on a plurality of time frames of the audio data, each of the plurality of transition matrices generated based on a different pair of time frames in the plurality of time frames, and indicating probabilities that anterior musical notes in an anterior time frame of the pair transition to posterior musical notes in a posterior time frame of the pair, generating, by executing one or more instructions on a processor, a data structure representing how the audio data changes statistically between the plurality of time frames based on the set of transition matrices, and causing, by executing one or more instructions on a processor, a database to store the data structure within metadata that describes the audio data.
METHOD AND SYSTEM FOR DETERMINING AND PROVIDING SENSORY EXPERIENCES
A method including: receiving a music input; determining values of musical parameters based on the input; generating a spatial representation of the music input based on the values; and at a plurality of haptic actuators defining a spatial distribution, cooperatively producing a haptic output based on the spatial representation. A method including: mechanically coupling haptic actuators defining a multidimensional array to a user; receiving a music input; generating a spatial representation of the music input defined on a multidimensional space, wherein the multidimensional space and the multidimensional array have equal dimensionality; and, for each haptic actuator: based on the haptic actuator location within the multidimensional array, determining a corresponding location within the multidimensional space; based on a value of the spatial representation associated with the corresponding location, determining an actuation intensity; and controlling the haptic actuator to actuate based on the actuation intensity.
CHAINED AUTHENTICATION USING MUSICAL TRANSFORMS
A service receives a request from a user of a group of users to perform one or more operations requiring group authentication in order for the operations to be performed. In response, the service provides a first user of the group with an image seed and an ordering of the group of users. Each user of the group applies a transformation algorithm to the seed to create an authentication claim. The service receives this claim and determines, based at least in part on the ordering of the group of users, an ordered set of transformations, which are used to create a reference image file. If the received claim matches the reference image file, the service enables performance of the requested one or more operations.
Addition of virtual bass in the time domain
Provided are, among other things, systems, methods and techniques for processing an audio signal to add virtual bass. In one representative embodiment, an apparatus includes: (a) an input line that inputs an original audio signal in the time domain; (b) a bass extraction filter that extracts a bass portion of the original audio signal, which also is in the time domain; (c) an estimator that estimates a fundamental frequency of a bass sound within the bass portion; (d) a frequency translator that shifts the bass portion by a positive frequency increment that is an integer multiple of the fundamental frequency estimated by the estimator, thereby providing a virtual bass signal; (e) an adder having (i) inputs coupled to the original audio signal and to the virtual bass signal and (ii) an output; and (f) an audio output device coupled to the output of the adder.
Addition of Virtual Bass in the Time Domain
Provided are, among other things, systems, methods and techniques for processing an audio signal to add virtual bass. In one representative embodiment, an apparatus includes: (a) an input line that inputs an original audio signal in the time domain; (b) a bass extraction filter that extracts a bass portion of the original audio signal, which also is in the time domain; (c) an estimator that estimates a fundamental frequency of a bass sound within the bass portion; (d) a frequency translator that shifts the bass portion by a positive frequency increment that is an integer multiple of the fundamental frequency estimated by the estimator, thereby providing a virtual bass signal; (e) an adder having (i) inputs coupled to the original audio signal and to the virtual bass signal and (ii) an output; and (f) an audio output device coupled to the output of the adder.
Systems and methods for audio based synchronization using sound harmonics
Multiple audio files may be synchronized using harmonic sound included in audio content obtained from audio tracks. Individual audio tracks are partitioned into multiple temporal windows of a first and second temporal window length. Individual audio waveforms for individual temporal windows of the first and second window length are transformed into frequency space in which energy is represented as a function of frequency. Individual pitches and magnitudes of harmonic sound determined for individual temporal windows may be compared using a multi-resolution framework to correlate pitches and harmonic energy of multiple audio tracks to one another.
SYSTEM AND METHOD FOR PROCESSING SIGNALS REPRESENTATIVE OF A BIOLOGICAL INFORMATION
The invention relates to a method of processing a biological signal including peaks, said biological signal being recorded by at least one sensor, the method comprising: extracting a peak from the biological signal, processing the peak with a mathematical transform in order to model the peak as a solution to a differential equation, converting the model of the peak into a sound, repeating the extracting, processing and converting steps for a plurality of peaks of the biological signal so as to obtain a plurality of sounds, each sound corresponding to a respective peak, generating a melody including the plurality of sounds.