Patent classifications
G10H2250/235
SOUND SIGNAL SYNTHESIS METHOD, GENERATIVE MODEL TRAINING METHOD, SOUND SIGNAL SYNTHESIS SYSTEM, AND RECORDING MEDIUM
A method generates first pitch data indicating a pitch of a first sound signal to be synthesized; and uses a generative model to estimate output data indicative of the first sound signal based on the generated first pitch data. The generative model has been trained to learn a relationship between second pitch data indicating a pitch of a second sound signal and the second sound signal. The first pitch data includes a first plurality of pieces of pitch notation data corresponding to pitch names, and is generated by setting, from among the first plurality of pieces of pitch notation data, a first piece of pitch notation data that corresponds to the pitch of the first sound signal as a hot value based on a difference between a reference pitch of a pitch name corresponding to the first piece of pitch notation data and the pitch of the first sound signal.
VIDEO CONTROL DEVICE AND VIDEO CONTROL METHOD
This video control device includes: a detection unit that detects a beat timing of audio; and a control unit that updates a display mode of a video on the basis of the beat timing and change information indicating a change in a display mode of a video displayed on a display device.
Apparatus and method for decomposing an audio signal using a ratio as a separation characteristic
An apparatus for decomposing an audio signal into a background component signal and a foreground component signal includes: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks including at least two blocks; and a separator for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal includes the background portion of the current block and the foreground component signal includes the foreground portion of the current block.
Apparatus and method for pitch-shifting audio signal with low complexity
An apparatus and method for pitch-shifting an audio signal with low complexity are disclosed. The method includes identifying a distance between an audio object included in the audio signal and a listener, checking whether the distance between the audio object and the listener decreases, and performing stepwise stretching pitch-shifting of repeatedly using at least one of frequency components of the audio signal when the distance between the audio object and the listener decreases.
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal are disclosed. An example apparatus includes an audio characteristic extractor to determine a logarithmic spectrum of an audio signal; transform the logarithmic spectrum of the audio signal into a frequency domain to generate a transform output; determine a magnitude of the transform output; and determine a timbre attribute of the audio signal based on an inverse transform of the magnitude.
Sound signal generation method, generative model training method, sound signal generation system, and recording medium
A computer-implemented sound signal generation method includes: obtaining a first sound source spectrum of a sound signal to be generated; obtaining a first spectral envelope of the sound signal; and estimating fragment data representative of samples of the sound signal based on the obtained first sound source spectrum and the obtained first spectral envelope.
Media content identification on mobile devices
A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.
Media content identification on mobile devices
A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.
METHODS AND APPARATUS TO EXTRACT A PITCH-INDEPENDENT TIMBRE ATTRIBUTE FROM A MEDIA SIGNAL
Methods and apparatus to extract a pitch-independent timbre attribute from a media signal are disclosed. An example apparatus includes an audio characteristic extractor to determine a logarithmic spectrum of an audio signal; transform the logarithmic spectrum of the audio signal into a frequency domain to generate a transform output; determine a magnitude of the transform output; and determine a timbre attribute of the audio signal based on an inverse transform of the magnitude.
Mutating spectral resynthesizer system and methods
A method of and system for generating audio having pitch attributes of an incoming audio stream. The method comprises receiving a digital audio input. The audio spectrum is analyzed and integrated over segments of digital audio data upon receiving analysis triggers which can be synced with the audio tempo. The integrated spectrum is processed to find peak frequencies in the spectrum and their associated gain stored in a peaks array. The peak frequencies are used to program the oscillators controllable attributes and characteristics. The synthesis is performed upon receiving an analysis clock. A number of digital oscillators are configured with the associated frequency parameters and gain parameters from a peaks array. The oscillators are configured according to the audio pitch analysis and generate an oscillator output at the frequency and gain specified in the peaks array. These oscillator outputs are summed together generating synthesized audio.