G10L19/0212

Post-Quantization Gain Correction in Audio Coding
20170330573 · 2017-11-16 ·

A gain adjustment apparatus for use in decoding of audio that has been encoded with separate gain and shape representations includes an accuracy meter configured to estimate an accuracy measure of the shape representation, and to determine a gain correction based on the estimated accuracy measure. An envelope adjuster further included in the apparatus is configured to adjust the gain representation based on the determined gain correction.

DOWNSCALED DECODING

A downscaled version of an audio decoding procedure may more effectively and/or at improved compliance maintenance be achieved if the synthesis window used for downscaled audio decoding is a downsampled version of a reference synthesis window involved in the non-downscaled audio decoding procedure by downsampling by the downsampling factor by which the downsampled sampling rate and the original sampling rate deviate, and downsampled using a segmental interpolation in segments of ¼ of the frame length.

Encoding device and decoding device

An encoding device (200) includes an MDCT unit (202) that transforms an input signal in a time domain into a frequency spectrum including a lower frequency spectrum, a BWE encoding unit (204) that generates extension data which specifies a higher frequency spectrum at a higher frequency than the lower frequency spectrum, and an encoded data stream generating unit (205) that encodes to output the lower frequency spectrum obtained by the MDCT unit (202) and the extension data obtained by the BWE encoding unit (204). The BWE encoding unit (204) generates as the extension data (i) a first parameter which specifies a lower subband which is to be copied as the higher frequency spectrum from among a plurality of the lower subbands which form the lower frequency spectrum obtained by the MDCT unit (202) and (ii) a second parameter which specifies a gain of the lower subband after being copied.

Speech decoder with high-band generation and temporal envelope shaping
09779744 · 2017-10-03 · ·

A linear prediction coefficient of a signal represented in a frequency domain is obtained by performing linear prediction analysis in a frequency direction by using a covariance method or an autocorrelation method. After the filter strength of the obtained linear prediction coefficient is adjusted, filtering may be performed in the frequency direction on the signal by using the adjusted coefficient, whereby the temporal envelope of the signal is shaped. This reduces the occurrence of pre-echo and post-echo and improves the subjective quality of the decoded signal, without significantly increasing the bit rate in a bandwidth extension technique in the frequency domain represented by SBR.

Method and apparatus for processing an audio signal

The present invention relates to a method for processing an audio signal, comprising: a step of performing a frequency conversion process on an audio signal to obtain a plurality of frequency transform coefficients; a step of selecting either a general mode or a non-general mode, on the basis of a pulse ratio, for the frequency transform coefficients having a high frequency band from among the plurality of frequency transform coefficients; and a step of performing, if the non-general mode is selected, the following steps: extracting a predetermined number of pulses from the frequency transform coefficients having the high frequency band, and generating pulse information; generating an original noise signal from the frequency transform coefficients having the high frequency band, excluding the pulses; generating a reference noise signal using the frequency transform coefficient having a low frequency band from among the plurality of frequency transform coefficients; and generating noise position information and noise energy information using the original noise signal and the reference noise signal.

Methods, encoder and decoder for handling envelope representation coefficients

A method performed by an encoder. The method comprises determining envelope representation residual coefficients as first compressed envelope representation coefficients subtracted from the input envelope representation coefficients. The method comprises transforming the envelope representation residual coefficients into a warped domain so as to obtain transformed envelope representation residual coefficients. The method comprises applying, at least one of a plurality of gain-shape coding schemes on the transformed envelope representation residual coefficients in order to achieve gain-shape coded envelope representation residual coefficients, where the plurality of gain-shape coding schemes have mutually different trade-offs in one or more of gain resolution and shape resolution for one or more of the transformed envelope representation residual coefficients. The method comprises transmitting, over a communication channel to a decoder, a representation of the first compressed envelope representation coefficients, the gain-shape coded envelope representation residual coefficients, and information on the at least one applied gain-shape coding scheme.

Methods and apparatus to identify sources of network streaming services using windowed sliding transforms

Methods and apparatus to identify sources of network streaming services using windowed sliding transforms are disclosed. An example apparatus includes a windowed sliding transformer to perform a first time-frequency analysis of a first block of a first received audio signal according to a first trial compression configuration, and perform a second time-frequency analysis of the first block of the first audio signal according to a second trial compression configuration, wherein the windowed sliding transformer includes a multiplier to multiply a vector including a first frequency-domain representation and a matrix including a third frequency-domain representation, a coding format identifier to identify, from the received first audio signal representing a decompressed second audio signal, an audio compression configuration used to compress a third audio signal to form the second audio signal, wherein the audio compression configuration is the first trial compression configuration or the second trial compression configuration, and a source identifier to identify a source of the second audio signal based on the identified audio compression configuration.

ENCODING OF MULTIPLE AUDIO SIGNALS

A device includes an encoder and a transmitter. The encoder is configured to determine a mismatch value indicative of an amount of temporal mismatch between a reference channel and a target channel. The encoder is also configured to determine whether to perform a first temporal-shift operation on the target channel at least based on the mismatch value and a coding mode to generate an adjusted target channel. The encoder is further configured to perform a first transform operation on the reference channel to generate a frequency-domain reference channel and perform a second transform operation on the adjusted target channel to generate a frequency-domain adjusted target channel. The encoder is also configured to estimate one or more stereo cues based on the frequency-domain reference channel and the frequency-domain adjusted target channel. The transmitter is configured to transmit the one or more stereo cues to a receiver.

Method for audio source separation and corresponding apparatus

Separation of speech and background from an audio mixture by using a speech example, generated from a source associated with a speech component in the audio mixture, to guide the separation process.

Spectrally orthogonal audio component processing
11432069 · 2022-08-30 · ·

A system processes an audio signal using spectrally orthogonal sound components. The system includes a circuitry that generates a mid component and a side component from a left channel and a right channel of the audio signal. The circuitry generates a hyper mid component including spectral energy of the side component removed from spectral energy of the mid component. The circuitry filters the hyper mid component, such as to provide spatial cue processing including panning or binaural processing, dynamic range processing, or other types of processing. The circuitry generates a left output channel and a right output channel using the filtered hyper mid component.