Patent classifications
G10H2210/046
COMPLEX EVOLUTION RECURRENT NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.
MUSIC CLASSIFIER AND RELATED METHODS
An audio device that includes a music classifier that determines when music is present in an audio signal is disclosed. The audio device is configured to receive audio, process the received audio, and to output the processed audio to a user. The processing may be adjusted based on the output of the music classifier. The music classifier utilizes a plurality of decision making units, each operating on the received audio independently. The decision making units are simplified to reduce the processing, and therefore the power, necessary for operation. Accordingly each decision making unit may be insufficient to determine music alone but in combination may accurately detect music while consuming power at a rate that is suitable for a mobile device, such as a hearing aid.
CREATIVE GAN GENERATING ART DEVIATING FROM STYLE NORMS
A method and system for generating art uses artificial intelligence to analyze existing art forms and then creates art that deviates from the learned styles. Known art created by humans is presented in digitized form along with a style designator to a computer for analysis, including recognition of artistic elements and association of particular styles. A graphics processor generates a draft graphic image for similar analysis by the computer. The computer ranks such draft image for correlation with artistic elements and known styles. The graphics processor modifies the draft image using an iterative process until the resulting image is recognizable as art but is distinctive in style.
APPARATUS AND METHOD FOR DECOMPOSING AN AUDIO SIGNAL USING A RATIO AS A SEPARATION CHARACTERISTIC
An apparatus for decomposing an audio signal into a background component signal and a foreground component signal includes: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a block characteristic of a current block of the audio signal and for determining an average characteristic for a group of blocks, the group of blocks including at least two blocks; and a separator for separating the current block into a background portion and a foreground portion in response to a ratio of the block characteristic of the current block and the average characteristic of the group of blocks, wherein the background component signal includes the background portion of the current block and the foreground component signal includes the foreground portion of the current block.
APPARATUS AND METHOD FOR DECOMPOSING AN AUDIO SIGNAL USING A VARIABLE THRESHOLD
An apparatus for decomposing an audio signal into a background component signal and a foreground component signal, has: a block generator for generating a time sequence of blocks of audio signal values; an audio signal analyzer for determining a characteristic of a current block of the audio signal and for determining a variability of the characteristic within a group of blocks having at least two blocks of the sequence of blocks; and a separator for separating the current block into a background portion and a foreground portion wherein the separator is configured to determine a separation threshold based on the variability and to separate the current block into the background component signal and the foreground component signal, when the characteristic of the current block is in a predetermined relation to the separation threshold.
COMPLEX EVOLUTION RECURRENT NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
Identifying Music as a Particular Song
In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products for indicating a reference song. A computing device stores reference song characterization data that identifies a plurality of audio characteristics for each reference song in a plurality of reference songs. The computing device receives digital audio data that represents audio recorded by a microphone, converts the digital audio data from time-domain format into frequency-domain format, and uses the digital audio data in the frequency-domain format in a music-characterization process. In response to determining that characterization values for the digital audio data are most relevant to characterization values for a particular reference song, the computing device outputs an indication of the particular reference song.
Determining that Audio Includes Music and then Identifying the Music as a Particular Song
In general, the subject matter described in this disclosure can be embodied in methods, systems, and program products. A computing device stores reference song characterization data and receives digital audio data. The computing device determines whether the digital audio data represents music and then performs a different process to recognize that the digital audio data represents a particular reference song. The computing device then outputs an indication of the particular reference song.
MUSIC DETECTION AND IDENTIFICATION
A sensor processing unit comprises a microphone and a sensor processor. The sensor processor is coupled with the microphone. The sensor processor is configured to operate the microphone to capture an audio sample from an environment in which the microphone is disposed. The sensor processor is configured to perform music activity detection on the audio sample to detect for music within the audio sample. Responsive to detection of music within the audio sample, the sensor processor is configured to send a music detection signal to an external processor located external to the sensor processing unit, the music detection signal indicating that music has been detected in the environment.