Patent classifications
G10H2210/046
LIVE BROADCAST ROOM DISPLAY METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM
Provided is a live broadcast room display method, apparatus and device, and a storage medium. The method includes: acquiring a speech signal within a set duration of at least one live broadcast room under a target classification label; inputting the speech signal within a set duration of the at least one live broadcast room into a speech detection model to obtain a speech signal that satisfies a set type condition; adding a display identifier to a live broadcast room corresponding to the speech signal that satisfies of live broadcast room; and arranging and displaying the at least one live broadcast room in a display interface corresponding to the target classification label according to the display identifier.
Determining that Audio Includes Music and then Identifying the Music as a Particular Song
In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.
Creative GAN generating art deviating from style norms
A method and system for generating art uses artificial intelligence to analyze existing art forms and then creates art that deviates from the learned styles. Known art created by humans is presented in digitized form along with a style designator to a computer for analysis, including recognition of artistic elements and association of particular styles. A graphics processor generates a draft graphic image for similar analysis by the computer. The computer ranks such draft image for correlation with artistic elements and known styles. The graphics processor modifies the draft image using an iterative process until the resulting image is recognizable as art but is distinctive in style.
Apparatus and method for decomposing an audio signal using a variable threshold
An apparatus for decomposing an audio signal into a background component signal and a foreground component signal, has: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a characteristic of a current block of the audio signal and for determining a variability of the characteristic within a group of blocks having at least two blocks of the sequence of blocks; and a separator for separating the current block into a background portion and a foreground portion wherein the separator is configured to determine a separation threshold based on the variability and to separate the current block into the background component signal and the foreground component signal, when the characteristic of the current block is in a predetermined relation to the separation threshold.
Apparatuses and methods for audio classifying and processing
Apparatus and methods for audio classifying and processing are disclosed. In one embodiment, an audio processing apparatus includes an audio classifier for classifying an audio signal into at least one audio type in real time; an audio improving device for improving experience of audience; and an adjusting unit for adjusting at least one parameter of the audio improving device in a continuous manner based on the confidence value of the at least one audio type.
Toolboxes, systems, kits and methods relating to supplying precisely timed, synchronized music
Systems, devices, and methods, etc., that provide digital audio toolboxes, music kits, digital audio tracks, etc., herein supply digital audio tracks such as music for combination with and synchronization with digital pre-existing media tracks. The toolkits, etc., herein provide users with visual tracks in media, to create, provide and/or synchronize precisely timed tracks used in audio media productions, or otherwise to provide multiple, precisely timed and synced tracks where a music/sound design track from the toolkits is added to a pre-made media track such as a visual footage.
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
Viseme data generation
Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.
Smart voice enhancement architecture for tempo tracking among music, speech, and noise
Audio data describing an audio signal may be received and used to determine a set of frames of the audio signal. A plurality of note onsets in the set of frames may be identified based on spectral energy of the audio signal in the set of frames. One or more tempos may be computed based on the identified plurality of note onsets. The one or more tempos may be validated based on a tempo validation condition. One or more music states of the audio signal may be determined based on the validated one or more tempos. Audio enhancement of the audio signal may be modified based on the one or more determined states of the audio signal.
Music detection and identification
A sensor processing unit comprises a sensor processor. The sensor processor is configured to communicatively couple with a microphone. The sensor processor is configured to acquire, from the microphone, a sample captured by the microphone from an environment in which the microphone is disposed. The sensor processor is configured to perform music activity detection on the audio sample to detect for music within the audio sample. Responsive to detection of music within the audio sample, the sensor processor is configured to send a music detection signal to an external processor located external to the sensor processing unit, the music detection signal indicating that music has been detected in the environment.