Patent classifications
G10L19/13
AUDIO SIGNAL COMPRESSION METHOD AND APPARATUS USING DEEP NEURAL NETWORK-BASED MULTILAYER STRUCTURE AND TRAINING METHOD THEREOF
A method, executed by a processor for compressing an audio signal in multiple layers, may comprise: (a) restoring, in a highest layer, an input audio signal as a first signal; (b) restoring, in at least one intermediate layer, a signal obtained by subtracting an upsampled signal, which is obtained by upsampling the audio signal restored in the highest layer or an immediately previous intermediate layer, from the input audio signal as a second signal; and (c) restoring, in a lowest layer, a signal obtained by subtracting an upsampled signal, which is obtained by upsampling the audio signal restored in an intermediate layer immediately before the lowest layer, from the input audio signal as a third signal, wherein the first signal, the second signal, and the third signal are combined to output a final restoration audio signal.
AUDIO ENCODER FOR ENCODING A MULTICHANNEL SIGNAL AND AUDIO DECODER FOR DECODING AN ENCODED AUDIO SIGNAL
Audio encoder for encoding a multichannel signal is shown. The audio encoder includes a downmixer for downmixing the multichannel signal to obtain a downmix signal, a linear prediction domain core encoder for encoding the downmix signal, wherein the downmix signal has a low band and a high band, wherein the linear prediction domain core encoder is configured to apply a bandwidth extension processing for parametrically encoding the high band, a filterbank for generating a spectral representation of the multichannel signal, and a joint multichannel encoder configured to process the spectral representation including the low band and the high band of the multichannel signal to generate multichannel information.
AUDIO ENCODER FOR ENCODING A MULTICHANNEL SIGNAL AND AUDIO DECODER FOR DECODING AN ENCODED AUDIO SIGNAL
Audio encoder for encoding a multichannel signal is shown. The audio encoder includes a downmixer for downmixing the multichannel signal to obtain a downmix signal, a linear prediction domain core encoder for encoding the downmix signal, wherein the downmix signal has a low band and a high band, wherein the linear prediction domain core encoder is configured to apply a bandwidth extension processing for parametrically encoding the high band, a filterbank for generating a spectral representation of the multichannel signal, and a joint multichannel encoder configured to process the spectral representation including the low band and the high band of the multichannel signal to generate multichannel information.
ENCODING METHOD, DECODING METHOD, ENCODER FOR PERFORMING ENCODING METHOD, AND DECODER FOR PERFORMING DECODING METHOD
An encoding method, a decoding method, an encoder for performing the encoding method, and a decoder for performing the decoding method are provided. The encoding method includes outputting LP coefficients bitstream and a residual signal by performing an LP analysis on an input signal, outputting a first latent signal obtained by encoding a periodic component of the residual signal, a second latent signal obtained by encoding a non-periodic component of the residual signal, and a weight vector for each of the first latent signal and the second latent signal, using a first neural network module, and outputting a first bitstream obtained by quantizing the first latent signal, a second bitstream obtained by quantizing the second latent signal, and a weight bitstream obtained by quantizing the weight vector, using a quantization module.
ENCODING METHOD, DECODING METHOD, ENCODER FOR PERFORMING ENCODING METHOD, AND DECODER FOR PERFORMING DECODING METHOD
An encoding method, a decoding method, an encoder for performing the encoding method, and a decoder for performing the decoding method are provided. The encoding method includes outputting LP coefficients bitstream and a residual signal by performing an LP analysis on an input signal, outputting a first latent signal obtained by encoding a periodic component of the residual signal, a second latent signal obtained by encoding a non-periodic component of the residual signal, and a weight vector for each of the first latent signal and the second latent signal, using a first neural network module, and outputting a first bitstream obtained by quantizing the first latent signal, a second bitstream obtained by quantizing the second latent signal, and a weight bitstream obtained by quantizing the weight vector, using a quantization module.
AUDIO SIGNAL ENCODING AND DECODING METHOD USING LEARNING MODEL, TRAINING METHOD OF LEARNING MODEL, AND ENCODER AND DECODER THAT PERFORM THE METHODS
An audio signal encoding and decoding method using a learning model, a training method of the learning model, and an encoder and decoder that perform the method, are disclosed. The audio signal decoding method may include extracting a first residual signal and a first linear prediction coefficient by decoding a bitstream received from an encoder, generating a first audio signal from the first residual signal using the first linear prediction coefficient, generating a second linear prediction coefficients and a second residual signal from the first audio signal, obtaining a third linear prediction coefficient by inputting the second linear prediction coefficient into a trained learning model, and generating a second audio signal from the second residual signal using the third linear prediction coefficient.
AUDIO SIGNAL ENCODING AND DECODING METHOD USING LEARNING MODEL, TRAINING METHOD OF LEARNING MODEL, AND ENCODER AND DECODER THAT PERFORM THE METHODS
An audio signal encoding and decoding method using a learning model, a training method of the learning model, and an encoder and decoder that perform the method, are disclosed. The audio signal decoding method may include extracting a first residual signal and a first linear prediction coefficient by decoding a bitstream received from an encoder, generating a first audio signal from the first residual signal using the first linear prediction coefficient, generating a second linear prediction coefficients and a second residual signal from the first audio signal, obtaining a third linear prediction coefficient by inputting the second linear prediction coefficient into a trained learning model, and generating a second audio signal from the second residual signal using the third linear prediction coefficient.
Artificial intelligence based audio coding
Techniques are described for coding audio signals. For example, using a neural network, a residual signal is generated for a sample of an audio signal based on inputs to the neural network. The residual signal is configured to excite a long-term prediction filter and/or a short-term prediction filter. Using the long-term prediction filter and/or the short-term prediction filter, a sample of a reconstructed audio signal is determined. The sample of the reconstructed audio signal is determined based on the residual signal generated using the neural network for the sample of the audio signal.
Artificial intelligence based audio coding
Techniques are described for coding audio signals. For example, using a neural network, a residual signal is generated for a sample of an audio signal based on inputs to the neural network. The residual signal is configured to excite a long-term prediction filter and/or a short-term prediction filter. Using the long-term prediction filter and/or the short-term prediction filter, a sample of a reconstructed audio signal is determined. The sample of the reconstructed audio signal is determined based on the residual signal generated using the neural network for the sample of the audio signal.
AUDIO ENCODER FOR ENCODING A MULTICHANNEL SIGNAL AND AUDIO DECODER FOR DECODING AN ENCODED AUDIO SIGNAL
A schematic block diagram of an audio encoder for encoding a multichannel audio signal is shown. The audio encoder includes a linear prediction domain encoder, a frequency domain encoder, and a controller for switching between the linear prediction domain encoder and the frequency domain encoder. The controller is configured such that a portion of the multichannel signal is represented either by an encoded frame of the linear prediction domain encoder or by an encoded frame of the frequency domain encoder. The linear prediction domain encoder includes a downmixer for downmixing the multichannel signal to obtain a downmixed signal. The linear prediction domain encoder further includes a linear prediction domain core encoder for encoding the downmix signal and furthermore, the linear prediction domain encoder includes a first joint multichannel encoder for generating first multichannel information from the multichannel signal.