Patent classifications
G10L25/18
ABNORMAL SOUND DIAGNOSIS SYSTEM
An abnormal sound diagnosis system includes a sound acquisition unit configured to acquire data of a sound generated from an object, an inquiry information acquisition unit configured to acquire inquiry information regarding an abnormal sound generated in the object, an arithmetic processing unit configured to acquire a spectrogram indicating a relationship among a time, a frequency, and sound pressure from the data of the sound, an extraction unit configured to acquire, based on the inquiry information acquired by the inquiry information acquisition unit, an inferred frequency range of the abnormal sound generated in the object and extract an extracted range corresponding the inferred frequency range of the spectrogram acquired by the arithmetic processing unit, and a diagnosis unit configured to diagnose, based on the extracted range of the spectrogram extracted by the extraction unit, a cause of the abnormal sound generated in the object.
ABNORMAL SOUND DIAGNOSIS SYSTEM
An abnormal sound diagnosis system includes a sound acquisition unit configured to acquire data of a sound generated from an object, an inquiry information acquisition unit configured to acquire inquiry information regarding an abnormal sound generated in the object, an arithmetic processing unit configured to acquire a spectrogram indicating a relationship among a time, a frequency, and sound pressure from the data of the sound, an extraction unit configured to acquire, based on the inquiry information acquired by the inquiry information acquisition unit, an inferred frequency range of the abnormal sound generated in the object and extract an extracted range corresponding the inferred frequency range of the spectrogram acquired by the arithmetic processing unit, and a diagnosis unit configured to diagnose, based on the extracted range of the spectrogram extracted by the extraction unit, a cause of the abnormal sound generated in the object.
Neural Network Audio Scene Classifier for Hearing Implants
An audio scene classifier classifies an audio input signal from an audio scene and includes a pre-processing neural network configured for pre-processing the audio input signal based on initial classification parameters to produce an initial signal classification, and a scene classifier neural network configured for processing the initial scene classification based on scene classification parameters to produce an audio scene classification output. The initial classification parameters reflect neural network training based on a first set of initial audio training data, and the scene classification parameters reflect neural network training on a second set of classification audio training data separate and different from the first set of initial audio training data. A hearing implant signal processor configured for processing the audio input signal and the audio scene classification output to generate the stimulation signals to the hearing implant for perception by the patient as sound.
PERCEPTUAL OPTIMIZATION OF MAGNITUDE AND PHASE FOR TIME-FREQUENCY AND SOFTMASK SOURCE SEPARATION SYSTEMS
A method comprises: obtaining softmask values for frequency bins of time-frequency tiles representing an audio signal; reducing, or expanding and limiting, the softmask values; and applying the reduced, or expanded and limited, softmask values to the frequency bins to create a time-frequency representation of an estimated target source. An alternative method comprises, for each time-frequency tile: obtaining softmask values; applying the softmask values to the frequency bins to create a time-frequency domain representation of an estimated target source; obtaining a panning parameter and a source concentration estimates for the target source; determining, using the panning parameter estimate and the softmask values, a magnitude for the time-frequency representation of the estimated target source; determining, using the panning parameter estimate and the source phase concentration estimate, a phase for the time-frequency representation of the estimated target source; and combining the magnitude and the phase.
PERCEPTUAL OPTIMIZATION OF MAGNITUDE AND PHASE FOR TIME-FREQUENCY AND SOFTMASK SOURCE SEPARATION SYSTEMS
A method comprises: obtaining softmask values for frequency bins of time-frequency tiles representing an audio signal; reducing, or expanding and limiting, the softmask values; and applying the reduced, or expanded and limited, softmask values to the frequency bins to create a time-frequency representation of an estimated target source. An alternative method comprises, for each time-frequency tile: obtaining softmask values; applying the softmask values to the frequency bins to create a time-frequency domain representation of an estimated target source; obtaining a panning parameter and a source concentration estimates for the target source; determining, using the panning parameter estimate and the softmask values, a magnitude for the time-frequency representation of the estimated target source; determining, using the panning parameter estimate and the source phase concentration estimate, a phase for the time-frequency representation of the estimated target source; and combining the magnitude and the phase.
IMPROVED PEAK DETECTOR
A method of operating an encoder or a decoder. The method comprises receiving an analysis signal of an audio signal and a filtered analysis signal, combining the filtered signal with the analysis signal to generate a combined signal using a maximum function that provides at least one of a maximum positive value at each index i of the combined signal and a maximum negative value at each index i of the combined signal. The method comprises identifying broad peaks and narrow peaks of the combined signal.
Methods and apparatus to perform audio watermarking and watermark detection and extraction
Methods and apparatus to perform audio watermarking and watermark detection and extraction are disclosed. Example apparatus disclosed herein are to select frequency components to be used to represent a code, different sets of frequency components to represent respectively different information, respective ones of the frequency components in the sets of frequency components located in respective code bands, there being multiple code bands and spacing between adjacent code bands being equal to or less than the spacing between adjacent frequency components in the code bands. Disclosed example apparatus are also to synthesize the frequency components to be used to represent the code, combine the synthesized frequency components with an audio block of an audio signal, and output the audio signal and a video signal associated with the audio signal.
Methods and apparatus to perform audio watermarking and watermark detection and extraction
Methods and apparatus to perform audio watermarking and watermark detection and extraction are disclosed. Example apparatus disclosed herein are to select frequency components to be used to represent a code, different sets of frequency components to represent respectively different information, respective ones of the frequency components in the sets of frequency components located in respective code bands, there being multiple code bands and spacing between adjacent code bands being equal to or less than the spacing between adjacent frequency components in the code bands. Disclosed example apparatus are also to synthesize the frequency components to be used to represent the code, combine the synthesized frequency components with an audio block of an audio signal, and output the audio signal and a video signal associated with the audio signal.
LOW LATENCY AUDIO PACKET LOSS CONCEALMENT
The invention provides a method for real-time concealing errors in audio data packets. A Long Short-Term Memory (LSTM) neural network with a plurality of nodes is provided and pre-trained with audio data. A sequence of packets is received, each packet comprising a set of modified discrete cosine transform (MDCT) coefficients associated with a frame comprising time-domain samples of the audio signal. These MDCT coefficient data are applied to the LSTM neural network, and in case it is identified that a received packet is an erroneous packet, an output from the LSTM neural network is used to generate estimated MDCT co-efficients to provide a concealment packet to replace the erroneous packet. Preferably, the MDCT coefficients are normalized prior to applying to the LSTM neural network. This method can be performed in real-time. A low latency can be obtained and still with a high audio quality.
METHOD AND DEVICE FOR MANAGING AUDIO BASED ON SPECTROGRAM
Various embodiments herein provide a method for managing an audio based on a spectrogram. The method includes generating, by a transmitter device, the spectrogram of the audio. The method includes identifying a first spectrogram corresponding to vocals in the audio and a second spectrogram corresponding to music in the audio from the spectrogram of the audio, and extracting a music feature from the second spectrogram. The method includes transmitting a signal comprising the first spectrogram, the second spectrogram, the music feature and the audio to a receiver device. The method includes determining, by the receiver device, whether an audio drop is occurring in the received signal based on a parameter associated with the received signal. The method includes generating the audio using the first spectrogram, the second spectrogram, the music feature, in response to determining that the audio drop is occurring in the received signal.