Patent classifications
G10L19/03
APPARATUS AND METHOD FOR ENCODING OR DECODING AN AUDIO SIGNAL WITH INTELLIGENT GAP FILLING IN THE SPECTRAL DOMAIN
An apparatus for decoding an encoded audio signal, includes a spectral domain audio decoder for generating a first decoded representation of a first set of first spectral portions, the decoded representation having a first spectral resolution; a parametric decoder for generating a second decoded representation of a second set of second spectral portions having a second spectral resolution being lower than the first spectral resolution; a frequency regenerator for regenerating every constructed second spectral portion having the first spectral resolution using a first spectral portion and spectral envelope information for the second spectral portion; and a spectrum time converter for converting the first decoded representation and the reconstructed second spectral portion into a time representation.
Apparatus and method for decoding or encoding an audio signal using energy information values for a reconstruction band
An apparatus for decoding an encoded audio signal having an encoded representation of a first set of first spectral portions and an encoded representation of parametric data indicating spectral energies for a second set of second spectral portions, has: an audio decoder for decoding the encoded representation of the first set of the first spectral portions to obtain a first set of first spectral portions and for decoding the encoded representation of the parametric data to obtain a decoded parametric data for the second set of second spectral portions indicating, for individual reconstruction bands, individual energies; a frequency regenerator for reconstructing spectral values in a reconstruction band having a second spectral portion using a first spectral portion of the first set of the first spectral portions and an individual energy for the reconstruction band, the reconstruction band having a first spectral portion and the second spectral portion.
Apparatus and method for decoding or encoding an audio signal using energy information values for a reconstruction band
An apparatus for decoding an encoded audio signal having an encoded representation of a first set of first spectral portions and an encoded representation of parametric data indicating spectral energies for a second set of second spectral portions, has: an audio decoder for decoding the encoded representation of the first set of the first spectral portions to obtain a first set of first spectral portions and for decoding the encoded representation of the parametric data to obtain a decoded parametric data for the second set of second spectral portions indicating, for individual reconstruction bands, individual energies; a frequency regenerator for reconstructing spectral values in a reconstruction band having a second spectral portion using a first spectral portion of the first set of the first spectral portions and an individual energy for the reconstruction band, the reconstruction band having a first spectral portion and the second spectral portion.
Audio decoding device, audio coding device, audio decoding method, audio coding method, audio decoding program, and audio coding program
An objective of the present invention is to correct a temporal envelope shape of a decoded signal with a small information volume and to reduce perceptible distortions. An audio decoding device which decodes a coded audio signal and outputs an audio signal comprises: a coded series analysis unit that analyzes a coded series which contains the coded audio signal; an audio decoding unit that receives from the coded series analysis unit the coded series which contains the coded audio signal and decodes same, obtaining an audio signal; a temporal envelope shape establishment unit that receives information from the coded series analysis unit and/or the audio decoding unit, and, on the basis of the information, establishes a temporal envelope shape of the decoded audio signal; and a temporal envelope correction unit that, on the basis of the temporal envelope shape which is established with the temporal envelope shape establishment unit, corrects the temporal envelope shape of the decoded audio signal and outputs same.
Encoding method, decoding method, encoding apparatus, and decoding apparatus
An encoding method, a decoding method, an encoding apparatus, a decoding apparatus, a transmitter, a receiver, and a communications system. The encoding method includes: dividing a to-be-encoded time-domain signal into a low band signal and a high band signal; performing encoding on the low band signal to obtain a low frequency encoding parameter; performing encoding on the high band signal to obtain a high frequency encoding parameter, and obtaining a synthesized high band signal; performing short-time post-filtering processing on the synthesized high band signal to obtain a short-time filtering signal; and calculating a high frequency gain based on the high band signal and the short-time filtering signal. A technical solution according to the embodiments of the present invention can improve an encoding and/or decoding effect.
Encoding method, decoding method, encoding apparatus, and decoding apparatus
An encoding method, a decoding method, an encoding apparatus, a decoding apparatus, a transmitter, a receiver, and a communications system. The encoding method includes: dividing a to-be-encoded time-domain signal into a low band signal and a high band signal; performing encoding on the low band signal to obtain a low frequency encoding parameter; performing encoding on the high band signal to obtain a high frequency encoding parameter, and obtaining a synthesized high band signal; performing short-time post-filtering processing on the synthesized high band signal to obtain a short-time filtering signal; and calculating a high frequency gain based on the high band signal and the short-time filtering signal. A technical solution according to the embodiments of the present invention can improve an encoding and/or decoding effect.
DURATION INFORMED ATTENTION NETWORK (DURIAN) FOR AUDIO-VISUAL SYNTHESIS
A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.
DURATION INFORMED ATTENTION NETWORK (DURIAN) FOR AUDIO-VISUAL SYNTHESIS
A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.
Apparatus for post-processing an audio signal using a transient location detection
Apparatus for post-processing an audio signal, including: a converter for converting the audio signal into a time-frequency representation; a transient location estimator for estimating a location in time of a transient portion using the audio signal or the time-frequency representation; and a signal manipulator for manipulating the time-frequency representation, wherein the signal manipulator is configured to reduce or eliminate a pre-echo in the time-frequency representation at a location in time before the transient location or to perform a shaping of the time-frequency representation at the transient location to amplify an attack of the transient portion.
Apparatus for post-processing an audio signal using a transient location detection
Apparatus for post-processing an audio signal, including: a converter for converting the audio signal into a time-frequency representation; a transient location estimator for estimating a location in time of a transient portion using the audio signal or the time-frequency representation; and a signal manipulator for manipulating the time-frequency representation, wherein the signal manipulator is configured to reduce or eliminate a pre-echo in the time-frequency representation at a location in time before the transient location or to perform a shaping of the time-frequency representation at the transient location to amplify an attack of the transient portion.