Patent classifications
G10H2210/056
AUDIO ANALYSIS METHOD AND AUDIO ANALYSIS DEVICE
An audio analysis method is realized by a computer and includes generating key information which represents a key, by inputting a time series of a feature amount of an audio signal into a learned model that has learned a relationship between keys and time series of feature amounts of audio signals.
METHOD, SYSTEM, AND COMPUTER-READABLE MEDIUM FOR CREATING SONG MASHUPS
A system, method and computer product for combining audio tracks. In one example embodiment herein, the method comprises determining at least one music track that is musically compatible with a base music track, aligning those tracks in time, and combining the tracks. In one example embodiment herein, the tracks may be music tracks of different songs, the base music track can be an instrumental accompaniment track, and the at least one music track can be a vocal track. Also in one example embodiment herein, the determining is based on musical characteristics associated with at least one of the tracks, such as an acoustic feature vector distance between tracks, a likelihood of at least one track including a vocal component, a tempo, or musical key. Also, determining of musical compatibility can include determining at least one of a vertical musical compatibility or a horizontal musical compatibility among tracks.
KARAOKE QUERY PROCESSING SYSTEM
Computer systems and methods are provided for processing audio queries. An electronic device receives an audio clip and performs a matching process on the audio clip. The matching process includes comparing at least a portion of the audio clip to a plurality of reference audio tracks and identifying, based on the comparing, a first portion of a particular reference track that corresponds to the audio sample. Upon identifying the matching portion, the electronic device provides a backing track for playback which corresponds to the particular reference track, and an initial playback position of the backing track.
METHODS AND APPARATUS TO EXTRACT A PITCH-INDEPENDENT TIMBRE ATTRIBUTE FROM A MEDIA SIGNAL
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal are disclosed. An example apparatus includes an audio characteristic extractor to determine a logarithmic spectrum of an audio signal; transform the logarithmic spectrum of the audio signal into a frequency domain to generate a transform output; determine a magnitude of the transform output; and determine a timbre attribute of the audio signal based on an inverse transform of the magnitude.
BEAT DECOMPOSITION TO FACILITATE AUTOMATIC VIDEO EDITING
The disclosed technology relates to a process for detecting musical artifacts within a musical composition. The detection of musical artifacts is based on analyzing the energy and frequency of the digital signal of the musical composition. The identification of musical artifacts within a musical composition would be used in connection with audio-video editing.
METHOD AND DEVICE FOR FOCUSING SOUND SOURCE
Disclosed are a sound source focus method and device in which the sound source focus device, in a 5G communication environment by amplifying and outputting a sound source signal of a user's object of interest extracted from an acoustic signal included in video content by executing a loaded artificial intelligence (AI) algorithm and/or machine learning algorithm. The sound source focus method includes playing video content including a video signal including at least one moving object and the acoustic signal in which sound sources output by the object are mixed, determining the user's object of interest from the video signal, acquiring unique sound source information about the user's object of interest, extracting an actual sound source for the user's object of interest corresponding to the unique sound source information from the acoustic signal, and outputting the actual sound source extracted for the user's object of interest.
Singing voice separation with deep U-Net convolutional networks
A system, method and computer product for estimating a component of a provided audio signal. The method comprises converting the provided audio signal to an image, processing the image with a neural network trained to estimate one of vocal content and instrumental content, and storing a spectral mask output from the neural network as a result of the image being processed by the neural network. The neural network is a U-Net. The method also comprises providing the spectral mask to a client media playback device, which applies the spectral mask to a spectrogram of the provided audio signal, to provide a masked spectrogram. The media playback device also transforms the masked spectrogram to an audio signal, and plays back that audio signal via an output user interface.
Method and device for focusing sound source
Disclosed are a sound source focus method and device in which the sound source focus device, in a 5G communication environment by amplifying and outputting a sound source signal of a user's object of interest extracted from an acoustic signal included in video content by executing a loaded artificial intelligence (AI) algorithm and/or machine learning algorithm. The sound source focus method includes playing video content including a video signal including at least one moving object and the acoustic signal in which sound sources output by the object are mixed, determining the user's object of interest from the video signal, acquiring unique sound source information about the user's object of interest, extracting an actual sound source for the user's object of interest corresponding to the unique sound source information from the acoustic signal, and outputting the actual sound source extracted for the user's object of interest.
Automatic isolation of multiple instruments from musical mixtures
A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, ) of the neural network system to corresponding target signals. For each compared output f(X, ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.
METHOD AND SYSTEM FOR ENERGY-BASED SONG CONSTRUCTION
According to an embodiment, there is provided a system and method for automatic AI-based song construction based on ideas of a user. It provides and benefits from a combination of expert knowledge resident in an expert engine which contains rules for a musically correct song generation and machine learning in an AI-based audio loop selection engine for the selection of fitting audio loops from a database of audio loops. Additionally, in some embodiments there is provided a method of energy-based song construction where the tracks of a multi-track work are balanced depending on the desired output volume level of the final project.