G10H2210/046

Viseme data generation for presentation while content is output

Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.

Apparatuses and methods for audio classifying and processing

Apparatus and methods for audio classifying and processing are disclosed. In one embodiment, an audio processing apparatus includes an audio classifier for classifying an audio signal into at least one audio type in real time; an audio improving device for improving experience of audience; and an adjusting unit for adjusting at least one parameter of the audio improving device in a continuous manner based on the confidence value of the at least one audio type.

Illumination device, and frame provided with the same
09807854 · 2017-10-31 · ·

An illumination device includes a signal reception unit capable of receiving an audio signal from outside, a musical piece extraction unit capable of extracting continuous consonant sounds in the audio signal as a musical piece, a performance state detection unit capable of detecting start/end of performance of the musical piece according to a result of extraction by the musical piece extraction unit, a first illumination lamp capable of radiating ultraviolet rays, a second illumination lamp capable of radiating white light, and an illumination control unit capable of controlling on/off of the first illumination lamp, and of controlling on/off and illuminance of the second illumination lamp, The first and the second illumination lamps are turned on in response to detection of the start of performance, and the first and the second illumination lamps are turned off in response to detection of the end of performance.

Determining that audio includes music and then identifying the music as a particular song

In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.

Creative GAN generating music deviating from style norms

A method and system for generating music uses artificial intelligence to analyze existing musical compositions and then creates a musical composition that deviates from the learned styles. Known musical compositions created by humans are presented in digitized form along with a style designator to a computer for analysis, including recognition of musical elements and association of particular styles. A music generator generates a draft musical composition for similar analysis by the computer. The computer ranks such draft musical composition for correlation with known musical elements and known styles. The music generator modifies the draft musical composition using an iterative process until the resulting musical composition is recognizable as music but is distinctive in style.

Apparatus and method for decomposing an audio signal using a ratio as a separation characteristic

An apparatus for decomposing an audio signal into a background component signal and a foreground component signal includes: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks including at least two blocks; and a separator for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal includes the background portion of the current block and the foreground component signal includes the foreground portion of the current block.

Complex linear projection for acoustic modeling

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Music classifier and related methods

An audio device that includes a music classifier that determines when music is present in an audio signal is disclosed. The audio device is configured to receive audio, process the received audio, and to output the processed audio to a user. The processing may be adjusted based on the output of the music classifier. The music classifier utilizes a plurality of decision making units, each operating on the received audio independently. The decision making units are simplified to reduce the processing, and therefore the power, necessary for operation. Accordingly each decision making unit may be insufficient to determine music alone but in combination may accurately detect music while consuming power at a rate that is suitable for a mobile device, such as a hearing aid.

APPARATUS AND METHOD FOR DECOMPOSING AN AUDIO SIGNAL USING A VARIABLE THRESHOLD
20210295854 · 2021-09-23 ·

An apparatus for decomposing an audio signal into a background component signal and a foreground component signal, has: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a characteristic of a current block of the audio signal and for determining a variability of the characteristic within a group of blocks having at least two blocks of the sequence of blocks; and a separator for separating the current block into a background portion and a foreground portion wherein the separator is configured to determine a separation threshold based on the variability and to separate the current block into the background component signal and the foreground component signal, when the characteristic of the current block is in a predetermined relation to the separation threshold.

CREATIVE GAN GENERATING MUSIC DEVIATING FROM STYLE NORMS

A method and system for generating music uses artificial intelligence to analyze existing musical compositions and then creates a musical composition that deviates from the learned styles. Known musical compositions created by humans are presented in digitized form along with a style designator to a computer for analysis, including recognition of musical elements and association of particular styles. A music generator generates a draft musical composition for similar analysis by the computer. The computer ranks such draft musical composition for correlation with known musical elements and known styles. The music generator modifies the draft musical composition using an iterative process until the resulting musical composition is recognizable as music but is distinctive in style.