Patent classifications
G10H2240/141
SYSTEMS AND METHODS FOR ANALYZING COMPONENTS OF AUDIO TRACKS
A method is described comprising receiving a stem signal and an audio mix signal, wherein the audio mix signal comprises information of the stem signal. The method includes applying a first transform to the stem signal to provide a first stem spectrum, applying a second transform to the stem signal to provide a second stem spectrum, generating a plurality of mix signals using the audio mix signal, applying a first transform to each mix signal of the plurality of mix signals to provide a corresponding first mix signal spectrum, applying a second transform to each mix signal of the plurality of mix signals to provide a corresponding second mix signal spectrum, and using information of the first stem spectrum, the second stem spectrum, a first mix signal spectrum, or a second mix signal spectrum to detect the information of the stem signal in the audio mix signal.
System and method for automatically remixing digital music
Systems and methods augment a target media with a plurality of source media. The target media and source media are processed to form time frequency distributions (TFDs). Target features are extracted from the associated TFD and source features are extracted from each of the associated source TFDs. The target features are segmented into temporal portions that are compared with each of the plurality of source features to determine one or more matched source features having nearest matches to the target feature segments. Portions of the source media associated with the matched source features are mixed with the target media to form an augmented target media, wherein the mixing is based upon a probabilistic mixing algorithm that uses a distance between the matched target feature and source features to define an amplitude of each portion of the source media.
METHODS AND SYSTEMS FOR DETERMINING COMPACT SEMANTIC REPRESENTATIONS OF DIGITAL AUDIO SIGNALS
A method and system for determining a compact semantic representation of a digital audio signal using a computer-based system by calculating at least one low-level feature matrix from the digital audio signal; processing the low-level feature matrix or matrices using pre-trained machine learning engines including an ensemble of modules, wherein each module in the ensemble is trained to predict a one of a plurality of high-level feature values; and concatenating the obtained plurality of high-level feature values into a descriptor vector. The calculated descriptor vectors can be used alone, or in an arbitrary or temporally ordered combination with further descriptor vectors calculated from different audio signals extracted from the same music track, as a compact semantic representation of the respective music track.
Media Content Identification on Mobile Devices
A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.
ASSIGNING AUDIBLE ALERTS AMONG CO-LOCATED APPLIANCES
Smart appliances communicate over a local area network. A selected sound palette includes multiple soundfile groups that each include multiple soundfiles. Each soundfile within a given sound palette has a first common sound attribute, each soundfile within a given soundfile group has a second common sound attribute, each soundfile within a given soundfile group has a variation of the second common sound attribute that is unique among the soundfile groups, and each soundfile within a given soundfile group has a unique variation of a third common sound attribute. One soundfile group may be assigned to each of the appliances, and one soundfile within each soundfile group may be assigned to an alert type. The assigned soundfile group is sent to the appliance with identification of the alert type to which each soundfile is assigned, such that the appliance may use an assigned soundfile for audible alerts.
System and method for speaker identification in audio data
A system for identifying audio data includes a feature extraction module receiving unknown input audio data and dividing the unknown input audio data into a plurality of segments of unknown input audio data. A similarity module receives the plurality of segments of the unknown input audio data and receives known audio data from a known source, the known audio data being divided into a plurality of segments of known audio data. The similarity module performs comparisons between the segments of unknown input audio data and respective segments of known audio data and generates a respective plurality of similarity values representative of similarity between the segments of the comparisons, the comparisons being performed serially. The similarity module terminates the comparisons if the similarity values indicate insufficient similarity between the segments of the comparisons, prior to completing comparisons for all segments of the unknown input audio data.
Audio matching based on harmonogram
Apparatus, articles of manufacture, and systems for audio matching based on a harmonogram are disclosed. An example apparatus includes memory, and hardware to execute instructions to determine a first dominant frequency in a time slice of audio data based on a segment of a first spectrogram associated with the audio data, the first dominant frequency indicative of a first harmonic component of the time slice, determine a second dominant frequency indicative of a second harmonic component of the time slice, the second harmonic component less dominant than the first, generate a query harmonogram of the audio data, different segments of the query harmonogram representative of aggregate energy values of dominant frequencies in different time slices of the audio data, the dominant frequencies including at least one of the first or second dominant frequencies, and identify query sound based on a comparison of the query harmonogram to a reference harmonogram.
Karaoke query processing system
Computer systems and methods are provided for processing audio queries. An electronic device receives an audio clip and performs a matching process on the audio clip. The matching process includes comparing at least a portion of the audio clip to a plurality of reference audio tracks and identifying, based on the comparing, a first portion of a particular reference track that corresponds to the audio sample. Upon identifying the matching portion, the electronic device provides a backing track for playback which corresponds to the particular reference track, and an initial playback position of the backing track.
AUTOMATIC CONVERSION OF SPEECH INTO SONG, RAP OR OTHER AUDIBLE EXPRESSION HAVING TARGET METER OR RHYTHM
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
METHOD FOR DETECTING MELODY OF AUDIO SIGNAL AND ELECTRONIC DEVICE
A method for detecting a melody of an audio signal, including: dividing the audio signal into a plurality of audio segments based on a beat, detecting a pitch frequency of each frame of audio sub-signal in each of the audio segments, and estimating a pitch value of each of the audio segments based on the pitch frequency; determining a pitch name corresponding to each of the audio segments based on a frequency range of the pitch value; acquiring a musical scale of the audio signal by estimating a tonality of the audio signal based on the pitch name of each of the audio segments; and determining a melody of the audio signal based on a frequency interval of the pitch value of each of the audio segments in the musical scale.