Patent classifications
G10L19/02
Audio Signal Encoding Method, Decoding Method, Encoding Device, and Decoding Device
An audio signal encoding method includes obtaining a current frame of an audio signal, where the current frame includes a high frequency band signal and a low frequency band signal; obtaining a parameter of bandwidth extension of the current frame based on the high frequency band signal, the low frequency band signal, and configuration information of the bandwidth extension; obtaining tile information, where the tile information indicates a first frequency range in which tonal component detection needs to be performed on the high frequency band signal; performing tonal component detection in the first frequency range to obtain information about a tonal component of the high frequency band signal; and performing bitstream multiplexing on the parameter of the bandwidth extension and the information of the tonal component to obtain a payload bitstream.
SWITCHING BETWEEN STEREO CODING MODES IN A MULTICHANNEL SOUND CODEC
A method and device for encoding a stereo sound signal comprise stereo encoders using stereo modes operating in time domain (TD), in frequency domain (FD) or in modified discrete Fourier transform (MDCT) domain. A controller controls switching between the TD, FD and MDCT stereo modes. Upon switching from one stereo mode to the other, the switching controller may (a) recalculate at least one length of down-processed/mixed signal in a current frame of the stereo sound signal, (b) reconstruct a down-processed/mixed signal and also other signals related to the other stereo mode in the current frame, (c) adapt data structures and/or memories for coding the stereo sound signal in the current frame using the other stereo mode, and/or (d) alter a TD stereo channel down-mixing to maintain a correct phase of left and right channels of the stereo sound signal. Corresponding stereo sound signal decoding method and device are described.
Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
An apparatus for decoding data segments representing a time-domain data stream, a data segment being encoded in the time domain or in the frequency domain, a data segment being encoded in the frequency domain having successive blocks of data representing successive and overlapping blocks of time-domain data samples. The apparatus includes a time-domain decoder for decoding a data segment being encoded in the time domain and a processor for processing the data segment being encoded in the frequency domain and output data of the time-domain decoder to obtain overlapping time-domain data blocks. The apparatus further includes an overlap/add-combiner for combining the overlapping time-domain data blocks to obtain a decoded data segment of the time-domain data stream.
Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream
An apparatus for decoding data segments representing a time-domain data stream, a data segment being encoded in the time domain or in the frequency domain, a data segment being encoded in the frequency domain having successive blocks of data representing successive and overlapping blocks of time-domain data samples. The apparatus includes a time-domain decoder for decoding a data segment being encoded in the time domain and a processor for processing the data segment being encoded in the frequency domain and output data of the time-domain decoder to obtain overlapping time-domain data blocks. The apparatus further includes an overlap/add-combiner for combining the overlapping time-domain data blocks to obtain a decoded data segment of the time-domain data stream.
Reconstruction of audio scenes from a downmix
Audio objects are associated with positional metadata. A received downmix signal comprises downmix channels that are linear combinations of one or more audio objects and are associated with respective positional locators. In a first aspect, the downmix signal, the positional metadata and frequency-dependent object gains are received. An audio object is reconstructed by applying the object gain to an upmix of the downmix signal in accordance with coefficients based on the positional metadata and the positional locators. In a second aspect, audio objects have been encoded together with at least one bed channel positioned at a positional locator of a corresponding downmix channel. The decoding system receives the downmix signal and the positional metadata of the audio objects. A bed channel is reconstructed by suppressing the content representing audio objects from the corresponding downmix channel on the basis of the positional locator of the corresponding downmix channel.
Signal processing method and device
A signal processing method and device includes obtaining spectral coefficients of a current frame of an audio signal, in which N sub-bands of the current frame comprises at least one of the spectral coefficients. A total energy of M successive sub-bands of the N sub-bands, a total energy of K successive sub-bands of the N sub-bands, and an energy of a first sub-band are obtained to determine whether to modify original envelope values of the M sub-bands. When the original envelope values of the M sub-bands are modified, encoding bits are allocated to each of the N sub-bands according to the modified envelope values of the M sub-bands.
Signal processing method and device
A signal processing method and device includes obtaining spectral coefficients of a current frame of an audio signal, in which N sub-bands of the current frame comprises at least one of the spectral coefficients. A total energy of M successive sub-bands of the N sub-bands, a total energy of K successive sub-bands of the N sub-bands, and an energy of a first sub-band are obtained to determine whether to modify original envelope values of the M sub-bands. When the original envelope values of the M sub-bands are modified, encoding bits are allocated to each of the N sub-bands according to the modified envelope values of the M sub-bands.
AUDIO SIGNAL CODING METHOD AND APPARATUS
An audio signal coding method is provided that includes: obtaining a current frame of an audio signal; obtaining a coding parameter based on a power spectrum ratio of a current frequency in a current frequency area of at least a part of signals of the current frame, where the coding parameter indicates tonal component information of the at least a part of signals, the tonal component information includes at least one of location information of a tonal component, quantity information of tonal components, amplitude information of the tonal component, or energy information of the tonal component, and the power spectrum ratio of the current frequency is a ratio of a value of a power spectrum of the current frequency to a mean value of power spectrums of the current frequency area; and performing bitstream multiplexing on the coding parameter to obtain a coded bitstream.
AUDIO SIGNAL CODING METHOD AND APPARATUS
An audio signal coding method is provided that includes: obtaining a current frame of an audio signal; obtaining a coding parameter based on a power spectrum ratio of a current frequency in a current frequency area of at least a part of signals of the current frame, where the coding parameter indicates tonal component information of the at least a part of signals, the tonal component information includes at least one of location information of a tonal component, quantity information of tonal components, amplitude information of the tonal component, or energy information of the tonal component, and the power spectrum ratio of the current frequency is a ratio of a value of a power spectrum of the current frequency to a mean value of power spectrums of the current frequency area; and performing bitstream multiplexing on the coding parameter to obtain a coded bitstream.
Reducing Perceived Effects of Non-Voice Data in Digital Speech
Non-voice data is embedded in a voice bit stream that includes frames of voice bits by selecting a frame of voice bits to carry the non-voice data, placing non-voice identifier bits in a first portion of the voice bits in the selected frame, and placing the non-voice data in a second portion of the voice bits in the selected frame. The non-voice identifier bits are employed to reduce a perceived effect of the non-voice data on audible speech produced from the voice bit stream.