Patent classifications
G10L19/26
Digital encapsulation of audio signals
Encoding and decoding systems are described for the provision of high quality digital representations of audio signals with particular attention to the correct perceptual rendering of fast transients at modest sample rates. This is achieved by optimising downsampling and upsampling filters to minimise the length of the impulse response while adequately attenuating alias products that have been found perceptually harmful.
Digital encapsulation of audio signals
Encoding and decoding systems are described for the provision of high quality digital representations of audio signals with particular attention to the correct perceptual rendering of fast transients at modest sample rates. This is achieved by optimising downsampling and upsampling filters to minimise the length of the impulse response while adequately attenuating alias products that have been found perceptually harmful.
DATA PROCESSING METHOD AND TERMINAL THEREOF
The present application discloses a data processing method and a terminal thereof. The method includes: obtaining, in real time, target audio data from an on-line source; processing the target audio data using a first audio data processing approach and playing the processed target audio data at the terminal; while playing the processed target audio data: obtaining an audio data processing approach transition instruction, the audio data processing approach transition instruction including a second audio data processing approach and a real-time window of switching from the first audio data processing approach to the second audio data processing approach; in response to the audio data processing approach transition instruction, processing the target audio data received in the real-time window using the first audio data processing approach and the second audio data processing approach separately; and determining output audio data to be played at the terminal during the real-time window.
Low-frequency emphasis for LPC-based coding in frequency domain
The invention provides an audio encoder including a combination of a linear predictive coding filter having a plurality of linear predictive coding coefficients and a time-frequency converter, wherein the combination is configured to filter and to convert a frame of the audio signal into a frequency domain in order to output a spectrum based on the frame and on the linear predictive coding coefficients; a low frequency emphasizer configured to calculate a processed spectrum based on the spectrum, wherein spectral lines of the processed spectrum representing a lower frequency than a reference spectral line are emphasized; and a control device configured to control the calculation of the processed spectrum by the low frequency emphasizer depending on the linear predictive coding coefficients of the linear predictive coding filter.
Low-frequency emphasis for LPC-based coding in frequency domain
The invention provides an audio encoder including a combination of a linear predictive coding filter having a plurality of linear predictive coding coefficients and a time-frequency converter, wherein the combination is configured to filter and to convert a frame of the audio signal into a frequency domain in order to output a spectrum based on the frame and on the linear predictive coding coefficients; a low frequency emphasizer configured to calculate a processed spectrum based on the spectrum, wherein spectral lines of the processed spectrum representing a lower frequency than a reference spectral line are emphasized; and a control device configured to control the calculation of the processed spectrum by the low frequency emphasizer depending on the linear predictive coding coefficients of the linear predictive coding filter.
COLORLESS GENERATION OF ELEVATION PERCEPTUAL CUES USING ALL-PASS FILTER NETWORKS
A system includes one or more computing devices that encode spatial perceptual cues into a monaural channel to generate a plurality of output channels. A computing device determines a target amplitude response for the mid and side channels of the plurality of output channels, defining a spatial perceptual associated with one or more frequency-dependent phase shifts. The computing device determines a transfer function of a single-input, multi-output allpass filter based on the target amplitude response and determines coefficients of the allpass filter based on the transfer function, and processes the monaural channel with the coefficients of the allpass filter to generate the plurality of channels having the encoded spatial perceptual cues. The allpass filter is configured to be colorless with respect to the individual output channels, allowing for the placement of spatial cues into the audio stream to be decoupled from the overall coloration of the audio.
METHOD AND APPARATUS FOR DETERMINING PARAMETERS OF A GENERATIVE NEURAL NETWORK
Described herein is a method of determining parameters for a generative neural network for processing an audio signal, wherein the generative neural network includes an encoder stage mapping to a coded feature space and a decoder stage, each stage including a plurality of convolutional layers with one or more weight coefficients, the method comprising a plurality of cycles with sequential processes of: pruning the weight coefficients of either or both stages based on pruning control information, the pruning control information determining the number of weight coefficients that are pruned for respective convolutional layers; training the pruned generative neural network based on a set of training data; determining a loss for the trained and pruned generative neural network based on a loss function; and determining updated pruning control information based on the determined loss and a target loss. Further described are corresponding apparatus, programs, and computer-readable storage media.
METHOD AND APPARATUS FOR DETERMINING PARAMETERS OF A GENERATIVE NEURAL NETWORK
Described herein is a method of determining parameters for a generative neural network for processing an audio signal, wherein the generative neural network includes an encoder stage mapping to a coded feature space and a decoder stage, each stage including a plurality of convolutional layers with one or more weight coefficients, the method comprising a plurality of cycles with sequential processes of: pruning the weight coefficients of either or both stages based on pruning control information, the pruning control information determining the number of weight coefficients that are pruned for respective convolutional layers; training the pruned generative neural network based on a set of training data; determining a loss for the trained and pruned generative neural network based on a loss function; and determining updated pruning control information based on the determined loss and a target loss. Further described are corresponding apparatus, programs, and computer-readable storage media.
IMPROVED PEAK DETECTOR
A method of operating an encoder or a decoder. The method comprises receiving an analysis signal of an audio signal and a filtered analysis signal, combining the filtered signal with the analysis signal to generate a combined signal using a maximum function that provides at least one of a maximum positive value at each index i of the combined signal and a maximum negative value at each index i of the combined signal. The method comprises identifying broad peaks and narrow peaks of the combined signal.
IMPROVED PEAK DETECTOR
A method of operating an encoder or a decoder. The method comprises receiving an analysis signal of an audio signal and a filtered analysis signal, combining the filtered signal with the analysis signal to generate a combined signal using a maximum function that provides at least one of a maximum positive value at each index i of the combined signal and a maximum negative value at each index i of the combined signal. The method comprises identifying broad peaks and narrow peaks of the combined signal.