Patent classifications
G10L19/08
AUDIO SIGNAL ENCODING METHOD AND APPARATUS, AND AUDIO SIGNAL DECODING METHOD AND APPARATUS
An audio signal encoding method and apparatus, and an audio signal decoding method and apparatus, are described. The encoding method includes obtaining a target frequency-domain coefficient of a current frame and a reference target frequency-domain coefficient of the current frame. The encoding method further includes calculating a cost function based on the target frequency-domain coefficient and the reference target frequency-domain coefficient of the current frame, where the cost function is for determining whether to perform long-term prediction (LTP) processing on the current frame during encoding of the target frequency-domain coefficient of the current frame. Additionally, the method includes encoding the target frequency-domain coefficient of the current frame based on the cost function.
SIGNAL TRANSFORMATION BASED ON UNIQUE KEY-BASED NETWORK GUIDANCE AND CONDITIONING
A method comprises receiving input audio and target audio having a target audio characteristic. The method includes estimating key parameters that represent the target audio characteristic based on one or more of the target audio and the input audio. The method further comprises configuring a neural network, trained to be configured by the key parameters, with the key parameters to cause the neural network to perform a signal transformation of the input audio, to produce output audio having an output audio characteristic corresponding to and that matches the target audio characteristic.
SIGNAL TRANSFORMATION BASED ON UNIQUE KEY-BASED NETWORK GUIDANCE AND CONDITIONING
A method comprises receiving input audio and target audio having a target audio characteristic. The method includes estimating key parameters that represent the target audio characteristic based on one or more of the target audio and the input audio. The method further comprises configuring a neural network, trained to be configured by the key parameters, with the key parameters to cause the neural network to perform a signal transformation of the input audio, to produce output audio having an output audio characteristic corresponding to and that matches the target audio characteristic.
Method and apparatus for adaptively encoding and decoding high frequency band
Provided are a method and apparatus for encoding and decoding an audio signal. According to the present application, a signal of a high frequency band above a preset frequency band is adaptively encoded or decoded in the time domain or in the frequency domain by using a signal of a low frequency band below the preset frequency band. As such, the sound quality of a high frequency signal is not deteriorate even when an audio signal is encoded or decoded by using a small number of bits and thus coding efficiency may be maximized.
Method and apparatus for high frequency decoding for bandwidth extension
Disclosed are a method and an apparatus for high frequency decoding for bandwidth extension. The method for high frequency decoding for bandwidth extension comprises the steps of: decoding an excitation class; transforming a decoded low frequency spectrum on the basis of the excitation class; and generating a high frequency excitation spectrum on the basis of the transformed low frequency spectrum. The method and apparatus for high frequency decoding for bandwidth extension according to an embodiment can transform a restored low frequency spectrum and generate a high frequency excitation spectrum, thereby improving the restored sound quality without an excessive increase in complexity.
PHASE RECONSTRUCTION IN A SPEECH DECODER
Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.
METHOD, APPARATUS AND SYSTEM FOR HYBRID SPEECH SYNTHESIS
A method of decoding an original speech signal for hybrid adversarial-parametric speech synthesis comprising: (a) receiving quantized original linear prediction coding parameters estimated by applying linear prediction coding analysis filtering to an original speech signal and a quantized compressed representation of a residual of the original speech signal; (b) dequantizing the original linear prediction coding parameters and the compressed representation of the residual; (c) inputting the dequantized compressed representation of the residual into a decoder part of a Generator for applying adversarial mapping from the compressed residual domain to a fake (first) signal domain; (d) outputting, by the decoder part of the Generator, a fake speech signal; (e) applying linear prediction coding analysis filtering to the fake speech signal for obtaining a corresponding fake residual; (f) reconstructing the original speech signal by applying linear prediction coding cross-synthesis filtering to the fake residual and the dequantized original linear prediction coding analysis parameters.
METHOD, APPARATUS AND SYSTEM FOR HYBRID SPEECH SYNTHESIS
A method of decoding an original speech signal for hybrid adversarial-parametric speech synthesis comprising: (a) receiving quantized original linear prediction coding parameters estimated by applying linear prediction coding analysis filtering to an original speech signal and a quantized compressed representation of a residual of the original speech signal; (b) dequantizing the original linear prediction coding parameters and the compressed representation of the residual; (c) inputting the dequantized compressed representation of the residual into a decoder part of a Generator for applying adversarial mapping from the compressed residual domain to a fake (first) signal domain; (d) outputting, by the decoder part of the Generator, a fake speech signal; (e) applying linear prediction coding analysis filtering to the fake speech signal for obtaining a corresponding fake residual; (f) reconstructing the original speech signal by applying linear prediction coding cross-synthesis filtering to the fake residual and the dequantized original linear prediction coding analysis parameters.
Audio signal encoding and decoding method, and audio signal encoding and decoding apparatus
An audio signal encoding and decoding method, an audio signal encoding and decoding apparatus, a transmitter, a receiver, and a communications system, which can improve encoding and/or decoding performance. The audio signal encoding method includes dividing a to-be-encoded time domain signal into a low band signal and a high band signal; encoding the low band signal to obtain a low frequency encoding parameter; calculating a voiced degree factor, and predicting a high band excitation signal; weighting the high band excitation signal and random noise using the voiced degree factor, so as to obtain a synthesized excitation signal; and obtaining a high frequency encoding parameter based on the synthesized excitation signal and the high band signal. Technical solutions in the embodiments of the present invention can improve an encoding or decoding effect.
Audio signal encoding and decoding method, and audio signal encoding and decoding apparatus
An audio signal encoding and decoding method, an audio signal encoding and decoding apparatus, a transmitter, a receiver, and a communications system, which can improve encoding and/or decoding performance. The audio signal encoding method includes dividing a to-be-encoded time domain signal into a low band signal and a high band signal; encoding the low band signal to obtain a low frequency encoding parameter; calculating a voiced degree factor, and predicting a high band excitation signal; weighting the high band excitation signal and random noise using the voiced degree factor, so as to obtain a synthesized excitation signal; and obtaining a high frequency encoding parameter based on the synthesized excitation signal and the high band signal. Technical solutions in the embodiments of the present invention can improve an encoding or decoding effect.