Patent classifications
G10H2210/056
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal are disclosed. An example apparatus includes an audio characteristic extractor to determine a logarithmic spectrum of an audio signal; transform the logarithmic spectrum of the audio signal into a frequency domain to generate a transform output; determine a magnitude of the transform output; and determine a timbre attribute of the audio signal based on an inverse transform of the magnitude.
BEATBOXING TRANSCRIPTION
Methods, systems, and storage media for generating a beatbox transcript are disclosed. Some examples may include: receiving an audio signal having a plurality of beatbox sounds, generating a spectrogram of the audio signal, processing the spectrogram of the audio signal with a neural network model trained on training samples including beatbox sounds, generating, by the neural network model a beatbox sound activation map including a plurality of activation times for a plurality of beatbox sounds, decoding the beatbox sound activation map into a beatbox transcript and providing the beatbox transcript as an output.
METHODS AND APPARATUS TO EXTRACT A PITCH-INDEPENDENT TIMBRE ATTRIBUTE FROM A MEDIA SIGNAL
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal are disclosed. An example apparatus includes an audio characteristic extractor to determine a logarithmic spectrum of an audio signal; transform the logarithmic spectrum of the audio signal into a frequency domain to generate a transform output; determine a magnitude of the transform output; and determine a timbre attribute of the audio signal based on an inverse transform of the magnitude.
VIBROTACTILE CONTROL SYSTEMS AND METHODS
Methods and systems are disclosed to facilitate creating the sensation of vibrotactile movement on the body of a user. Vibratory motors are used to generate a haptic language for music or other stimuli that is integrated into wearable technology. The disclosed system in certain embodiments enables the creation of a family of devices that allow people such as those with hearing impairments to experience sounds such as music or other input to the system. For example, a “sound vest” or other wearable array transforms musical input to haptic signals so that users can experience their favorite music in a unique way, and can also recognize auditory or other cues in the user's real or virtual reality environment and convey this information to the user using haptic signals.
AUDIO ANALYSIS SYSTEM, ELECTRONIC MUSICAL INSTRUMENT, AND AUDIO ANALYSIS METHOD
An audio analysis system includes at least one memory configured to store instructions; and at least one processor configured to execute the instructions to: receive an instruction indicative of a target timbre; acquire a first audio signal containing a plurality of audio components corresponding to different timbres; and select at least one reference signal from among a plurality of reference signals respectively representative of different pieces of audio based on the target timbre and the first audio signal, in which: the at least one reference signal has an intensity with a temporal change, the temporal change in the intensity of the at least one reference signal is represented by a reference rhythm pattern, the plurality of audio components include audio components corresponding to the target timbre, the audio components corresponding to the target timbre have an intensity with a temporal change, the temporal change in the intensity of the audio components corresponding to the target timbre is represented by an analysis rhythm pattern, and the reference rhythm pattern is similar to the analysis rhythm pattern.
NEURAL NETWORK MODEL FOR AUDIO TRACK LABEL GENERATION
System and methods directed to identifying music theory labels for audio tracks are described. More specifically, a first training set of audio portions may be generated from a plurality of audio tracks, segments within the plurality of audio tracks being labeled according to a plurality of music theory labels. A deep neural network model may then be trained using the first training set as an input, a first loss function for music theory label identifications of audio portions of the first training set, and a second loss function for segment boundary identifications within the audio portions of the first training set. In examples, the music theory label identifications and the segment boundary identifications are generated by the deep neural network model. A first audio track is received and segment boundary identifications and music theory labels for segments within the first audio track are generated using the deep neural network model.
SYNCHRONIZED AUDIOVISUAL WORK
The teachings described herein are generally directed to a system, method, and apparatus for separating and mixing tracks within music. The system can have a video that is synchronized with the variations in the musical tempo through a variable timing reference track designed and provided for a user of the preselected performance that was prerecorded, wherein the designing of the variable timing reference track includes creating a tempo map having variable tempos, rhythms, and beats using notes from the preselected performance.
Methods and systems for suppressing vocal tracks
The methods and systems described herein aid users by modifying the presentation of content to users. For example, the methods and systems suppress the dialogue track of a movie when the user engages with the content by reciting a line of the movie as it is presented to the user. Words spoken by the user are detected and compared with the words in the movie. When the user is not engaging with the movie by reciting the lines or humming tunes while watching the movie, the audio track of the movie is not modified. Content can be modified in response to engagement by a single user or by multiple users (e.g., each reciting lines of a different character in a movie). Accordingly, the methods and systems described herein provide increased interest in and engagement with content.
AUDIO PROCESSING METHOD, APPARATUS, AND SYSTEM
An audio processing method, apparatus, and system. The processing method is applied to a sound pickup, and the sound pickup is connected to a controller. The method comprises: receiving timbre instruction information sent by a controller (S101); and analyzing the timbre instruction information, and carrying out, according to the timbre instruction information, timbre processing on audio that is picked up, wherein the timbre instruction information comprises timbre addition instruction information and/or timbre change instruction information (S102). The selection of various timbres in an audio processing process is enabled, the requirement for the integration of the related functions is met, and user experience is improved.
SONG GENERATION BASED ON A TEXT INPUT
The disclosure provides a method and an apparatus for song generation. A text input may be received. A topic and an emotion may be extracted from the text input. A melody may be determined according to the topic and the emotion. Lyrics may be generated according to the melody and the text input. A song may be generated at least according to the melody and the lyrics.