Patent classifications
G10H2210/056
APPARATUS AND METHOD FOR AUDIO SOURCE SEPARATION BASED ON CONVOLUTIONAL NEURAL NETWORK
A method for receiving a mono sound source audio signal including phase information as an input, and separating into a plurality of signals may comprise performing initial convolution and down-sampling on the inputted mono sound source audio signal; generating an encoded signal by encoding the inputted signal using at least one first dense block and at least one down-transition layer; generating a decoded signal by decoding the encoded signal using at least one second dense block and at least one up-transition layer; and performing final convolution and resize on the decoded signal.
Vibrotactile control systems and methods
Methods and systems are disclosed to facilitate creating the sensation of vibrotactile movement on the body of a user. Vibratory motors are used to generate a haptic language for music or other stimuli that is integrated into wearable technology. The disclosed system in certain embodiments enables the creation of a family of devices that allow people such as those with hearing impairments to experience sounds such as music or other input to the system. For example, a sound vest or other wearable array transforms musical input to haptic signals so that users can experience their favorite music in a unique way, and can also recognize auditory or other cues in the user's real or virtual reality environment and convey this information to the user using haptic signals.
Method and apparatus for correcting delay between accompaniment audio and unaccompanied audio, and storage medium
A method and apparatus for correcting a delay between accompaniment audio and unaccompanied audio, and a storage medium are provided. The method includes: acquiring original audio of a target song, and extracting original vocal audio from the original audio; determining a first delay between the original vocal audio and the unaccompanied audio, and determining a second delay between the accompaniment audio and the original audio; and correcting a delay between the accompaniment audio and the unaccompanied audio based on the first delay and the second delay. Thus, the correction efficiency of the delay between accompaniment audio and unaccompanied audio is improved, and correction mistakes possibly caused by human factors are eliminated, thereby improving the accuracy.
MACHINE LEARNING METHOD, AUDIO SOURCE SEPARATION APPARATUS, AND ELECTRONIC INSTRUMENT
A machine learning method for training a learning model includes: transforming a first audio type of audio data into a first image type of image data, wherein a first audio component and a second audio component are mixed in the first audio type of audio data, and the first image type of image data corresponds to the first audio type of audio data; transforming a second audio type of audio data into a second image type of image data, wherein the second audio type of audio data includes the first audio component without mixture of the second audio component, and the second image type of image data corresponds to the second audio type of audio data; and performing machine learning on the learning model with training data including sets of the first image type of image data and the second image type of image data.
MACHINE LEARNING METHOD AND MACHINE LEARNING APPARATUS
A machine learning apparatus includes a memory storing instructions and a processor that implements the stored instructions to execute a plurality of tasks. The tasks include an obtaining task that obtains a mixture signal containing a first component and a second component, a first generating task that generates a first signal that emphasize the first component inputting a mixture signal to a neural network, a second generating task that generates a second signal by modifying the first signal, a calculating task that calculates an evaluation index from the second signal, and a training task that trains the neural network with the evaluation index.
AUDIO STEM IDENTIFICATION SYSTEMS AND METHODS
Methods, systems and computer program products are provided for identifying an audio stem. Audio stems (t.sub.1, . . . , t.sub.N) are stored on a stem database and songs (S.sub.1, . . . , S.sub.P) made with at least a subset of the plurality of the audio stems (t.sub.1, . . . , t.sub.N) are stored on a song database. At least partially composed song (S*) having a predetermined number of pre-selected stems (k) are received. In turn, a probability vector (or relevance value or ranking) is produced for each stem (t.sub.1, . . . , t.sub.N) to be complementary to the at least partially composed song (S*).
Beat decomposition to facilitate automatic video editing
The disclosed technology relates to a process for detecting musical artifacts within a musical composition. The detection of musical artifacts is based on analyzing the energy and frequency of the digital signal of the musical composition. The identification of musical artifacts within a musical composition would be used in connection with audio-video editing.
Lyrics analyzer
A lyrics analyzer generates tags and explicitness indicators for a set of tracks. These tags may indicate the genre, mood, occasion, or other features of each track. The lyrics analyzer does so by generating an n-dimensional vector relating to a set of topics extracted from the lyrics and then using those vectors to train a classifier to determine whether each tag applies to each track. The lyrics analyzer may also generate playlists for a user based on a single seed song by comparing the lyrics vector or the lyrics and acoustics vectors of the seed song to other songs to select songs that closely match the seed song. Such a playlist generator may also take into account the tags generated for each track.
Singing voice separation with deep u-net convolutional networks
A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.
Singing voice separation with deep U-Net convolutional networks
A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.