G10H2210/056

METHODS AND SYSTEMS FOR SUPPRESSING VOCAL TRACKS

The methods and systems described herein aid users by modifying the presentation of content to users. For example, the methods and systems suppress the dialogue track of a movie when the user engages with the content by reciting a line of the movie as it is presented to the user. Words spoken by the user are detected and compared with the words in the movie. When the user is not engaging with the movie by reciting the lines or humming tunes while watching the movie, the audio track of the movie is not modified. Content can be modified in response to engagement by a single user or by multiple users (e.g., each reciting lines of a different character in a movie). Accordingly, the methods and systems described herein provide increased interest in and engagement with content.

Device and method for generating a real time music accompaniment for multi-modal music
10600398 · 2020-03-24 · ·

A device for generating a real time music accompaniment includes a music input interface, a music mode classifier that classifies pieces of music received at the music input interface into one of different music modes including at least a solo mode, a bass mode, and a harmony mode, a music storage, and a music output interface. A music selector selects one or more recorded pieces of music as real time music accompaniment to an actually played piece of music received at the music input interface, wherein the one or more selected pieces of music are selected to be in a different music mode than the actually played piece of music. A music output interface outputs the selected pieces of music.

Method and apparatus for generating audio content

In method the following is performed: receiving input audio content representing mixed audio sources; separating the mixed audio sources, thereby obtaining separated audio source signals and a residual signal; and generating output audio content by mixing the separated audio source signals and the residual signal.

Method, system and artificial neural network

It is disclosed a method comprising obtaining a target spectrum, obtaining a set of non-target spectra, the set of non-target spectra comprising one or more non-target spectra, summing the target spectrum and the set of non-target spectra to obtain a mixture spectrum, and training an artificial neural network by using the mixture spectrum as input of the neural network and by using a spectrum which is based on the target spectrum as desired output of the artificial neural network.

METHODS AND APPARATUS TO EXTRACT A PITCH-INDEPENDENT TIMBRE ATTRIBUTE FROM A MEDIA SIGNAL
20200051538 · 2020-02-13 ·

Methods and apparatus to classify media based on a pitch-independent timbre attribute from a media signal are disclosed. An example apparatus includes an interface to access a media signal; and an audio characteristic extractor to determine a spectrum of audio corresponding to the media signal; and determine a timbre-independent pitch attribute of the audio based on an inverse transform of a complex argument of a transform of the spectrum.

AUTOMATIC ISOLATION OF MULTIPLE INSTRUMENTS FROM MUSICAL MIXTURES

A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, ) of the neural network system to corresponding target signals. For each compared output f(X, ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.

SINGING VOICE SEPARATION WITH DEEP U-NET CONVOLUTIONAL NETWORKS

A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.

SINGING VOICE SEPARATION WITH DEEP U-NET CONVOLUTIONAL NETWORKS

A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.

SINGING VOICE SEPARATION WITH DEEP U-NET CONVOLUTIONAL NETWORKS
20200043517 · 2020-02-06 · ·

A system, method and computer product for estimating a component of a provided audio signal. The method comprises converting the provided audio signal to an image, processing the image with a neural network trained to estimate one of vocal content and instrumental content, and storing a spectral mask output from the neural network as a result of the image being processed by the neural network. The neural network is a U-Net. The method also comprises providing the spectral mask to a client media playback device, which applies the spectral mask to a spectrogram of the provided audio signal, to provide a masked spectrogram. The media playback device also transforms the masked spectrogram to an audio signal, and plays back that audio signal via an output user interface.

Terminal device, apparatus and method for transmitting an image
10547392 · 2020-01-28 · ·

Embodiments of the present disclosure relate to a method and apparatus for transmitting an image. The method includes converting, by a first device, an image to be transmitted into a number of sets of feature data according to a preset conversion rule; performing, by the first device, music composition according to preset music composition rules to obtain a music in accordance with musical tone rules through making each set of the number of sets of feature data correspond to one musical element; and playing, by the first device, the music to a second device. An image transmission manner utilizing a sound wave that is hearable to the human ear may be implemented.