AUDIO DECODER, AUDIO ENCODER, AND RELATED METHODS USING JOINT CODING OF SCALE PARAMETERS FOR CHANNELS OF A MULTI-CHANNEL AUDIO SIGNAL

Abstract

Audio decoder for decoding an encoded audio signal having multi-channel audio data having data for two or more audio channels, and information on jointly encoded scale parameters, having: a scale parameter decoder for decoding the information on the jointly encoded scale parameters to obtain a first and a second set of scale parameters for a first channel and a second channel, respectively, of a decoded audio signal; and a signal processor for applying the first and second sets of scale parameters to a first and second channel representation, respectively, derived from the multi-channel audio data to obtain the first and second channels of the decoded audio signal, wherein the jointly encoded scale parameters have information on a first group and on a second group of jointly encoded scale parameters, and wherein the scale parameter decoder is configured to combine a jointly encoded scale parameter of the first group and one of the second group using a first and a second combination rule, respectively, to obtain a scale parameter of the first and second sets of scale parameters.

Claims

1. An audio decoder for decoding an encoded audio signal comprising multi-channel audio data comprising data for two or more audio channels, and information on jointly encoded scale parameters, comprising: a scale parameter decoder for decoding the information on the jointly encoded scale parameters to acquire a first set of scale parameters for a first channel of a decoded audio signal and a second set of scale parameters for a second channel of the decoded audio signal; and a signal processor for applying the first set of scale parameters to a first channel representation derived from the multi-channel audio data and for applying the second set of scale parameters to a second channel representation derived from the multi-channel audio data to acquire the first channel and the second channel of the decoded audio signal, wherein the jointly encoded scale parameters comprise information on a first group of jointly encoded scale parameters and information on a second group of jointly encoded scale parameters, and wherein the scale parameter decoder is configured to combine a jointly encoded scale parameter of the first group and a jointly encoded scale parameter of the second group using a first combination rule to acquire a scale parameter of the first set of scale parameters, and using a second combination rule being different from the first combination rule to acquire a scale parameter of the second set of scale parameters.

2. The audio decoder of claim 1, wherein the first group of jointly encoded scale parameters comprises mid scale parameters and the second group of jointly encoded scale parameters comprises side scale parameters, and wherein the scale parameter decoder is configured to use, in the first combination rule, an addition, and to use, in the second combination rule, a subtraction.

3. The audio decoder of claim 1, wherein the encoded audio signal is organized in a sequence of frames, wherein a first frame comprises the multi-channel audio data and the information on the jointly encoded scale parameters, and wherein a second frame comprises separately encoded scale parameter information, and wherein the scale parameter decoder is configured to detect that the second frame comprises the separately encoded scale parameter information and to calculate the first set of scale parameters and the second set of scale parameters.

4. The audio decoder of claim 3, wherein the first frame and the second frame each comprise a state side information indicating, in a first state, that the first frame comprises the information on the jointly encoded scale parameters and, in a second state, that the second frame comprises the separately encoded scale parameter information, and wherein the scale parameter decoder is configured to read the state side information of the second frame, to detect that the second frame comprises the separately encoded scale parameter information based on the state side information read, or to read the state side information of the first frame, and to detect that the first frame comprises the information on the jointly encoded scale parameters using the state side information read.

5. The audio decoder of claim 1, wherein the signal processor is configured to decode the multi-channel audio data to derive the first channel representation and the second channel representation, wherein the first channel representation and the second channel representation are spectral domain representations comprising spectral sampling values, and wherein the signal processor is configured to apply each scale parameter of the first set and the second set to a corresponding plurality of the spectral sampling values to acquire a shaped spectral representation of the first channel and a shaped spectral representation of the second channel.

6. The audio decoder of claim 5, wherein the signal processor is configured to convert the shaped spectral representation of the first channel and the shaped spectral representation of the second channel into a time domain to acquire a time domain representation of the first channel and a time domain representation of the second channel of the decoded audio signal.

7. The audio decoder of claim 1, wherein the first channel representation comprises a first number of bands, wherein the first set of scale parameters comprises a second number of scale parameters, the second number being lower than the first number, and wherein the signal processor is configured to interpolate the second number of scale parameters to acquire a number of interpolated scale parameters being greater than or equal to the first number of bands, and wherein the signal processor is configured to scale the first channel representation using the interpolated scale parameters, or wherein the first channel representation comprises a first number of bands, wherein the information on the first group of jointly encoded scale parameters comprises a second number of jointly encoded scale parameters, the second number being lower than the first number, wherein the scale parameter decoder is configured to interpolate the second number of jointly encoded scale parameters to acquire a number of interpolated jointly encoded scale parameters being greater than or equal to the first number of bands, and wherein the scale parameter decoder is configured to process the interpolated jointly encoded scale parameters to determine the first set of scale parameters and the second set of scale parameters.

8. The audio decoder of claim 1, wherein the encoded audio signal is organized in a sequence of frames, wherein the information on the second group of jointly encoded scale parameters comprises, in a certain frame, a zero side information, wherein the scale parameter decoder is configured to detect the zero side information to determine that the second group of jointly encoded scale parameters are all zero for the certain frame, and wherein the scale parameter decoder is configured to derive the scale parameters of the first set of scale parameters and the second set of scale parameters only from the first group of jointly encoded scale parameters or to set, in the combining the jointly encoded scale parameter of the first group and the jointly encoded scale parameter of the second group, to zero values or values being smaller than a noise threshold.

9. The audio decoder of claim 1, wherein the scale parameter decoder is configured to de-quantize the information on the first group of jointly encoded scale parameters using a first de-quantization mode, and to de-quantize the information on the second group of jointly encoded scale parameters using a second de-quantization mode, the second de-quantization mode being different from the first de-quantization mode.

10. The audio decoder of claim 9, wherein the scale parameter decoder is configured to use the second de-quantization mode having associated a lower or higher quantization precision than the first de-quantization mode.

11. The audio decoder of claim 9, wherein the scale parameter decoder is configured to use, as the first de-quantization mode, a first de-quantization stage and a second de-quantization stage and a combiner, the combiner receiving, as an input, a result of the first de-quantization stage and a result of the second de-quantization stage, and to use, as the second de-quantization mode, the second de-quantization stage of the first de-quantization mode receiving, as an input, the information on the second group of jointly encoded scale parameters.

12. The audio decoder of claim 11, wherein the first de-quantization stage is a vector de-quantization stage and wherein the second de-quantization stage is an algebraic vector de-quantization stage, or wherein the first de-quantization stage is a fixed rate de-quantization stage and wherein the second de-quantization stage is a variable rate de-quantization stage.

13. The audio decoder of claim 11, wherein the information on the first group of jointly encoded scale parameters comprises, for a frame of the encoded audio signal, two or more indexes and wherein the information on the second group of jointly encoded scale parameters comprises a single index or a lower number of indexes or the same number of indexes as in the first group, and wherein the scale parameter decoder is configured to determine, in the first de-quantization stage e.g., for each index of the two or more indexes, intermediate jointly encoded scale parameters of the first group, and wherein the scale parameter decoder is configured to calculate, in the second de-quantization stage, residual jointly encoded scale parameters of the first group e.g. from the single or lower or the same number of indexes of the information on the first group of jointly encoded scale parameters and to calculate, by the combiner the first group of jointly encoded scale parameters from the intermediate jointly encoded scale parameters of the first group and the residual jointly encoded scale parameters of the first group.

14. The audio decoder of claim 11, wherein the first de-quantization stage comprises using an index for a first codebook comprising a first number of entries or using an index representing a first precision, wherein the second de-quantization stage comprises using an index for a second codebook comprising a second number of entries or using an index representing a second precision, and wherein the second number is lower or higher than the first number or the second precision is lower or higher than the first precision.

15. The audio decoder of claim 1, wherein the information on the second group of jointly encoded scale parameters indicates that the second group of jointly encoded scale parameters are all zero or at a certain value for a frame of the encoded audio signal, and wherein the scale parameter decoder is configured to use, in the combining using the first rule or the second rule, a jointly encoded scale parameter being zero or being at the certain value or being a synthesized jointly encoded scale parameter, or wherein, for the frame comprising the all zero or certain value information, the scale parameter decoder is configured to determine the second set of scale parameters only using the first group of jointly encoded scale parameters without a combining operation.

16. The audio decoder of claim 9, wherein the scale parameter decoder is configured to use, as the first de-quantization mode, the first de-quantization stage and the second de-quantization stage and the combiner, the combiner receiving, as an input, a result of the first de-quantization stage and a result of the second de-quantization stage, and to use, as the second de-quantization smoke, the first de-quantization stage of the first de-quantization mode.

17. An audio encoder for encoding a multi-channel audio signal comprising two or more channels, comprising: a scale parameter calculator for calculating a first group of jointly encoded scale parameters and a second group of jointly encoded scale parameters from a first set of scale parameters for a first channel of the multi-channel audio signal and from a second set of scale parameters for a second channel of the multi-channel audio signal; a signal processor for applying the first set of scale parameters to the first channel of the multi-channel audio signal and for applying the second set of scale parameters to the second channel of the multi-channel audio signal and for deriving multi-channel audio data; and an encoded signal former for using the multi-channel audio data and information on the first group of jointly encoded scale parameters and information on the second group of jointly encoded scale parameters to acquire an encoded multi-channel audio signal.

18. The audio encoder of claim 17, wherein the signal processor is configured, in the applying, to encode the first group of jointly encoded scale parameters and the second group of jointly encoded scale parameters to acquire the information on the first group of jointly encoded scale parameters and the information on the second group of jointly encoded scale parameters, to locally decode the information on the first and the second groups of jointly encoded scale parameters to acquire a locally decoded first set of scale parameters and a locally decoded second set of scale parameters, and to scale the first channel using the locally decoded first set of scale parameters and to scale the second channel using the locally decoded second set of scale parameters, or wherein the signal processor is configured, in the applying, to quantize the first group of jointly encoded scale parameters and the second group of jointly encoded scale parameters to acquire a quantized first group of jointly encoded scale parameters and a quantized second group of jointly encoded scale parameters, to locally decode the quantized first and the second groups of jointly encoded scale parameters to acquire a locally decoded first set of scale parameters and a locally decoded second set of scale parameters, and to scale the first channel using the locally decoded first set of scale parameters and to scale the second channel using the locally decoded second set of scale parameters.

19. The audio encoder of claim 17, wherein the scale parameter calculator is configured to combine a scale parameter of the first set of scale parameters and a scale parameter of the second set of scale parameters using a first combination rule to acquire a jointly encoded scale parameter of the first group of jointly encoded scale parameters, and using a second combination rule different from the first combination rule to acquire a jointly encoded scale parameter of the second group of jointly encoded scale parameters.

20. The audio encoder of claim 19, wherein the first group of jointly encoded scale parameters comprises mid scale parameters and the second group of jointly encoded scale parameters comprises side scale parameters, and wherein the scale parameter calculator is configured to use, in the first combination rule, an addition, and to use, in the second combination rule, a subtraction.

21. The audio encoder of claim 17, wherein the scale parameters calculator is configured to process a sequence of frames of the multi-channel audio signal, wherein the scale parameter calculator is configured to calculate first and second groups of jointly encoded scale parameters for a first frame of the sequence of frames, and to analyze a second frame of the sequence of frames to determine a separate coding mode for the second frame, and wherein the encoded signal former is configured to introduce a state side information into the encoded audio signal indicating a separate encoding mode for the second frame or a joint encoding mode for the first frame, and information on the first set and the second set of separately encoded scale parameters for the second frame.

22. The audio encoder of claim 17, wherein the scale parameter calculator is configured to calculate the first set of scale parameters for the first channel and the second set of scale parameters for the second channel, to downsample the first and the second sets of scale parameters to acquire a downsampled first set and a downsampled second set; and to combine a scale parameter from the downsampled first set and the downsampled second set using different combination rules to acquire a jointly encoded scale parameter of the first group and a jointly encoded scale parameter of the second group, or wherein the scale parameter calculator is configured to calculate the first set of sale parameters for the first channel and the second set of scale parameters for the second channel, to combine a scale parameter from the first set and a scale parameter from the second set using different combination rules to acquire a jointly encoded scale parameter of the first group and a jointly encoded scale parameter of the second group, and to downsample the first group of jointly encoded scale parameters to acquire a downsampled first group of jointly encoded scale parameters, and to downsample the second group of jointly encoded scale parameters to acquire a downsampled second group of jointly encoded scale parameters, wherein the downsampled first group and the downsampled second group represent the information on the first group of jointly encoded scale parameters and the information on the second group of jointly encoded scale parameters.

23. The audio encoder of claim 21, wherein the scale parameter calculator is configured to calculate a similarity of the first channel and the second channel in the second frame and to determine the separate encoding mode in case a calculated similarity is in a first relation to a threshold or to determine the joint encoding mode in case the calculated similarity is in a different second relation to the threshold.

24. The audio encoder of claim 23, wherein the scale parameter calculator is configured to calculate, for the second frame, a difference between the scale parameter of the first set and the scale parameter of the second set for each band, to process each difference of the second frame so that negative signs are removed to acquire processed differences of the second frame, to combine the processed differences to acquire a similarity measure, to compare the similarity measure to the threshold, and to decide in favor of the separate coding mode, when the similarity measure is greater than the threshold, or to decide in favor of the joint coding mode, when the similarity measure is lower than the threshold.

25. The audio encoder of claim 17, wherein the signal processor is configured to quantize the first group of jointly encoded scale parameters using a first stage quantization function to acquire one or more first quantization indexes as a first stage result and to acquire an intermediate first group of jointly encoded scale parameters, to calculate a residual first group of jointly encoded scale parameters from the first group of jointly encoded scale parameters and the intermediate first group of jointly encoded scale parameters, and to quantize the residual first group of jointly encoded scale parameters using a second stage quantization function to acquire one or more quantization indexes as a second stage result.

26. The audio encoder of claim 17, wherein the signal processor is configured to quantize the second group of jointly encoded scale parameters using a single stage quantization function to acquire one or more quantization indexes as the single stage result, or wherein the signal processor is configured for quantizing the first group of jointly encoded scale parameters using at least a first stage quantization function and a second stage quantization function, and wherein the signal processor is configured for quantizing the second group of jointly encoded scale parameters using a single stage quantization function, wherein the single stage quantization function is selected from the first stage quantization function and the second stage quantization function.

27. The audio encoder of claim 21, wherein the scale parameter calculator is configured to quantize the first set of scale parameters using a first stage quantization function to acquire one or more first quantization indexes as a first stage result and to acquire an intermediate first set of scale parameters, to calculate a residual first set of scale parameters from the first set of scale parameters and the intermediate first set of scale parameters, and to quantize the residual first set of scale parameters using a second stage quantization function to acquire one or more quantization indexes as a second stage result, or wherein the scale parameter calculator is configured to quantize the second set of scale parameters using a first stage quantization function to acquire one or more first quantization indexes as a first stage result and to acquire an intermediate second set of scale parameters, to calculate a residual second set of scale parameters from the second set of scale parameters and the intermediate second set of scale parameters, and to quantize the residual second set of scale parameters using a second stage quantization function to acquire one or more quantization indexes as a second stage result.

28. The audio encoder of claim 25, wherein the second stage quantization function uses an amplification or weighting value lower than 1 to increase the residual first group of jointly encoded scaling parameters or the residual first or second set of scale parameters before performing a vector quantization, wherein the vector quantization is performed using increased residual values, and/or wherein, exemplarily, the weighting or amplification value is used to divide a scaling parameter by the weighting or amplification value, wherein the weighting value is advantageously between 0.1 and 0.9, or more advantageously between 0.2 and 0.6 or even more advantageously between 0.25 and 0.4, and/or wherein the same amplification value is used for all scaling parameters of the residual first group of jointly encoded scaling parameters or the residual first or second set of scale parameters.

29. The audio encoder of claim 25, wherein the first stage quantization function comprises at least one codebook with a first number of entries corresponding to a first size of the one or more quantization indexes, wherein the second stage quantization function or the single stage quantization function comprises at least one codebook with a second number of entries corresponding to a second size of the one or more quantization indexes, and wherein the first number is greater or lower than the second number or the first size is greater or lower than the second size, or wherein the wherein the first stage quantization function is a fixed rate quantization function and wherein the second stage quantization function is a variable rate quantization function.

30. The audio encoder of claim 17, wherein the scale parameter calculator is configured to receive a first MDCT representation for the first channel and a second MDCT representation for the second channel, to receive a first MDST representation for the first channel and a second MDST representation for the second channel, to calculate a first power spectrum for the first channel from the first MDCT representation and the first MDST representation and a second power spectrum for the second channel from the second MDCT representation and the second MDST representation, and to calculate the first set of scale parameters for the first channel from the first power spectrum and to calculate the second set of scale parameters for the second channel from the second power spectrum.

31. The audio encoder of claim 30, wherein the signal processor is configured to scale the first MDCT representation using information derived from the first set of scale parameters, and to scale the second MDCT representation using information derived from the second set of scale parameters.

32. The audio encoder of claim 17, wherein the signal processor is configured to further process a scaled first channel representation and a scaled second channel representation using a joint multi-channel processing to derive a multi-channel processed representation of the multi-channel audio signal, to optionally further process using a spectral band replication processing or an intelligent gap filling processing or a bandwidth enhancement processing and to quantize and encode a representation of the channels of the multi-channel audio signal to acquire the multi-channel audio data.

33. The audio encoder of claim 17, being configured to determine, for a frame of the multi-channel audio signal, the information on the second group of jointly encoded scale parameters as an all zero or all certain value information indicating the same value or a zero value for all jointly encoded scale parameters of the frame and wherein the encoded signal former is configured to use the all zero or all certain value information to acquire the encoded multi-channel audio signal.

34. The audio encoder of claim 17, wherein the scale parameter calculator is configured for calculating the first group of jointly encoded scale parameters and the second group of jointly encoded scale parameters for a first frame, for calculating the first group of jointly encoded scale parameters for a second frame, wherein, in the second frame, the jointly encoded scale parameters are not calculated or encoded, and wherein the encoded signal former is configured to use a flag as the information on the second group of jointly encoded scale parameters indicating that, in the second frame, any jointly encoded scale parameters of the second group are not comprised in the encoded multichannel audio signal.

35. A method of decoding an encoded audio signal comprising multi-channel audio data comprising data for two or more audio channels, and information on jointly encoded scale parameters, comprising: decoding the information on the jointly encoded scale parameters to acquire a first set of scale parameters for a first channel of a decoded audio signal and a second set of scale parameters for a second channel of the decoded audio signal; and applying the first set of scale parameters to a first channel representation derived from the multi-channel audio data and for applying the second set of scale parameters to a second channel representation derived from the multi-channel audio data to acquire the first channel and the second channel of the decoded audio signal, wherein the jointly encoded scale parameters comprise information on a first group of jointly encoded scale parameters and information on a second group of jointly encoded scale parameters, and wherein the decoding comprises combining a jointly encoded scale parameter of the first group and a jointly encoded scale parameter of the second group using a first combination rule to acquire a scale parameter of the first set of scale parameters, and using a second combination rule being different from the first combination rule to acquire a scale parameter of the second set of scale parameters.

36. A method of encoding a multi-channel audio signal comprising two or more channels, comprising: calculating a first group of jointly encoded scale parameters and a second group of jointly encoded scale parameters from a first set of scale parameters for a first channel of the multi-channel audio signal and from a second set of scale parameters for a second channel of the multi-channel audio signal; applying the first set of scale parameters to the first channel of the multi-channel audio signal and applying the second set of scale parameters to the second channel of the multi-channel audio signal and for deriving multi-channel audio data; and using the multi-channel audio data and information on the first group of jointly encoded scale parameters and information on the second group of jointly encoded scale parameters to acquire an encoded multi-channel audio signal.

37. A non-transitory digital storage medium having stored thereon a computer program for performing a method of decoding an encoded audio signal comprising multi-channel audio data comprising data for two or more audio channels, and information on jointly encoded scale parameters, comprising: decoding the information on the jointly encoded scale parameters to acquire a first set of scale parameters for a first channel of a decoded audio signal and a second set of scale parameters for a second channel of the decoded audio signal; and applying the first set of scale parameters to a first channel representation derived from the multi-channel audio data and for applying the second set of scale parameters to a second channel representation derived from the multi-channel audio data to acquire the first channel and the second channel of the decoded audio signal, wherein the jointly encoded scale parameters comprise information on a first group of jointly encoded scale parameters and information on a second group of jointly encoded scale parameters, and wherein the decoding comprises combining a jointly encoded scale parameter of the first group and a jointly encoded scale parameter of the second group using a first combination rule to acquire a scale parameter of the first set of scale parameters, and using a second combination rule being different from the first combination rule to acquire a scale parameter of the second set of scale parameters, when said computer program is run by a computer.

38. A non-transitory digital storage medium having stored thereon a computer program for performing a method of encoding a multi-channel audio signal comprising two or more channels, comprising: calculating a first group of jointly encoded scale parameters and a second group of jointly encoded scale parameters from a first set of scale parameters for a first channel of the multi-channel audio signal and from a second set of scale parameters for a second channel of the multi-channel audio signal; applying the first set of scale parameters to the first channel of the multi-channel audio signal and applying the second set of scale parameters to the second channel of the multi-channel audio signal and for deriving multi-channel audio data; and using the multi-channel audio data and information on the first group of jointly encoded scale parameters and information on the second group of jointly encoded scale parameters to acquire an encoded multi-channel audio signal, when said computer program is run by a computer.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0065] Embodiments of the present invention are subsequently discussed with respect to the accompanying drawings, in which:

[0066] FIG. 1 illustrates a decoder in accordance with the first aspect;

[0067] FIG. 2 illustrates an encoder in accordance with the first aspect;

[0068] FIG. 3a illustrates another encoder in accordance with the first aspect;

[0069] FIG. 3b illustrates another implementation of an encoder in accordance with the first aspect;

[0070] FIG. 4a illustrates a further embodiment of a decoder in accordance with the first aspect;

[0071] FIG. 4b illustrates another embodiment of a decoder;

[0072] FIG. 5 illustrates a further embodiment of an encoder;

[0073] FIG. 6 illustrates a further embodiment of an encoder;

[0074] FIG. 7a illustrates an implementation of a vector quantizer in accordance with a first or second aspect;

[0075] FIG. 7b illustrates a further quantizer in accordance with the first or second aspect;

[0076] FIG. 8a illustrates a decoder in accordance with a first aspect of the present invention;

[0077] FIG. 8b illustrates an encoder in accordance with the first aspect of the present invention;

[0078] FIG. 9a illustrates an encoder in accordance with the second aspect of the present invention;

[0079] FIG. 9b illustrates a decoder in accordance with the second aspect of the present invention;

[0080] FIG. 10 illustrates an implementation of a decoder in accordance with the first or second aspect;

[0081] FIG. 11 is a block diagram of an apparatus for encoding an audio signal;

[0082] FIG. 12 is a schematic representation of an implementation of the scale factor calculator of FIG. 1:

[0083] FIG. 13 is a schematic representation of an implementation of the downsampler of FIG. 1;

[0084] FIG. 14 is a schematic representation of the scale factor encoder of FIG. 4;

[0085] FIG. 15 is a schematic illustration of the spectral processor of FIG. 1;

[0086] FIG. 16 illustrates a general representation of an encoder on the one hand and a decoder on the other hand implementing spectral noise shaping (SNS);

[0087] FIG. 17 illustrates a more detailed representation of the encoder-side on the one hand and the decoder-side on the other hand where temporal noise shaping (TNS) is implemented together with spectral noise shaping (SNS);

[0088] FIG. 18 illustrates a block diagram of an apparatus for decoding an encoded audio signal:

[0089] FIG. 19 illustrates a schematic illustration illustrating details of the scale factor decoder, the spectral processor and the spectrum decoder of FIG. 8;

[0090] FIG. 20 illustrates a subdivision of the spectrum into 64 bands;

[0091] FIG. 21 illustrates a schematic illustration of the downsampling operation on the one hand and the interpolation operation on the other hand;

[0092] FIG. 22a illustrates a time-domain audio signal with overlapping frames;

[0093] FIG. 22b illustrates an implementation of the converter of FIG. 1;

[0094] FIG. 22c illustrates a schematic illustration of the converter of FIG. 8;

[0095] FIG. 23 illustrates a histogram comparing different inventive procedures;

[0096] FIG. 24 illustrates an embodiment of an encoder; and

[0097] FIG. 25 illustrates an embodiment of a decoder.

DETAILED DESCRIPTION OF THE INVENTION

[0098] FIG. 8 illustrates an audio decoder for decoding an encoded audio signal comprising multi-channel audio data comprising data for two or more audio channels, and information on jointly encoded scale parameters. The decoder comprises a scale parameter decoder 220 and a signal processor 210, 212, 213 illustrated in FIG. 8a as a single item. The scale parameter decoder 220 receives the information on the jointly encoded first group and second group of scale parameters where, advantageously, the first group of scale parameters are mid scale parameters and the second group of scale parameters are side scale parameters. Advantageously, the signal processor receives the first channel representation of the multi-channel audio data and the second channel representation of the multi-channel audio data and applies the first set of scale parameters to a first channel representation derived from the multi-channel audio data and applies the second set of scale parameters to the second channel representation derived from the multi-channel audio data to obtain the first channel and the second channel of the decoded audio signal at the output of block 210, 212, 213 of FIG. 8a. Advantageously, the jointly encoded scale parameters comprise information on the first group of jointly encoded scale parameters such as mid-scale parameters and information on a second group of jointly encoded scale parameters such as side scale parameters. Furthermore, the scale parameter decoder 220 is configured to combine a jointly encoded scale parameter of the first group and a jointly encoded scale parameter of the second group using a first combination rule to obtain a scale parameter of the first set of scale parameters and to combine the same both jointly encoded scale parameters of the first and second groups using a second combination rule which is different from the first combination rule to obtain a scale parameter of the second set of scale parameters. Thus, the scale parameter decoder 220 applies two different combination rules.

[0099] In an embodiment, the two different combination rules are a plus or addition combination rule on the one hand and a subtraction or difference combination rule on the other hand. However, in other embodiments, the first combination rule can be a multiplication combination rule and the second combination rule can be a quotient or division combination rule. Thus, all other pairs of combination rules are useful as well depending on the representation of the corresponding scale parameters of the first group and the second group or of the first set and the second set of scale parameters.

[0100] FIG. 8b illustrates a corresponding audio encoder for encoding a multi-channel audio signal comprising two or more channels. The audio encoder comprises a scale parameter calculator 140, a signal processor 120 and an encoded signal former 1480, 1500. The scale parameter calculator 140 is configured for calculating a first group of jointly encoded scale parameters and a second group of jointly encoded scale parameters from a first set of scale parameters for a first channel of the multi-channel audio signal and from a second set of scale parameters for a second channel of the multi-channel audio signal. Additionally, the signal processor is configured for applying the first set of scale parameters to the first channel of the multi-channel audio signal and for applying the second set of scale parameters to the second channel of the multi-channel audio signal for deriving encoded multi-channel audio data. The multi-channel audio data are derived from the scaled first and second channels and the multi-channel audio data are used by the encoded signal former 1480, 1500 together with the information on the first and the second group of jointly encoded scale parameters to obtain the encoded multi-channel audio signal at the output of block 1500 in FIG. 8b.

[0101] FIG. 1 illustrates a further implementation of the decoder of FIG. 8a. Particularly, the bitstream is input into the signal processor 210 that performs, typically, entropy decoding and inverse quantization together with intelligent gap filling procedures (IGF procedures) and inverse stereo processing of the scaled or whitened channels. The output of block 210 are scaled or whitened decoded left and right or, generally, several decoded channels of a multi-channel signal. The bitstream comprises side information bits for the scale parameters for left and right in the case of separate encoding and side information bits for scaled jointly encoded scale parameters illustrated as M, S scale parameters in FIG. 1. This data is introduced into the scale parameter or scale factor decoder 220 that, at its output, generates the decoded left scale factors and the decoded right scale factors that are then applied in the shape spectrum block 212, 230 to finally obtain an advantageous MDCT spectrum for left and right that can then be converted into a time domain using a certain inverse MDCT operation.

[0102] The corresponding encoder-side implementation is given in FIG. 2. FIG. 2 starts from an MDCT spectrum having a left and a right channel that are input into a spectrum shaper 120a, and the output of the spectrum shaper 120a is input into a processor 120b that, for example, performs a stereo processing, intelligent gap filling operations on an encoder side and corresponding quantization and (entropy) coding operations. Thus, blocks 120a, 120b together represent the signal processor 120 of FIG. 8b. Furthermore, for the purpose of the calculation of the scale factors which is performed in the block compute SNS (spectral noise shaping) scale factors 120b, an MDST spectrum is provided as well, and the MDST spectrum together with the MDCT spectrum is forwarded into a power spectrum calculator 110a. Alternatively, the power spectrum calculator 110a can operate directly on the input signal without an MDCT or MDST spectrum procedure. Another way would be to calculate the power spectrum from a DFT operation rather than an MDCT and an MDST operation, for example. Furthermore, the scale factors are calculated by the scale parameter calculator 140 that is illustrated in FIG. 2 as a block quantization encoding of scale factors. Particularly, block 140 outputs, dependent on the similarity between the first and the second channel, either separate encoded scale factors for left and right or jointly encoded scale factors for M and S. This is illustrated in FIG. 2 to the right of block 140. Thus, in this implementation, block 110b calculates the scale factors for left and right and block 140 then determines, whether separate encoding, i.e., encoding for the left and right scale factors is better or worse than encoding of jointly encoded scale factors, i.e., M and S scale factors derived from the separate scale factors by the two different combination rules such as an addition on the one hand and a subtraction on the other hand.

[0103] The result of block 140 are side information bits for L. R or M, S that are, together with the result of block 120b, introduced into an output bitstream illustrated by FIG. 2.

[0104] FIG. 3a illustrates an implementation of the encoder of FIG. 2 or FIG. 8b. The first channel is input into a block 1100a that determines the separate scale parameters for the first channel, i.e., for channel L. Additionally, the second channel is input into block 1100b that determines the separate scale parameters for the second channel, i.e., for R. Then, the scale parameters for the left channel and the scale parameters for the right channel are correspondingly downsampled by a downsampler 130a for the first channel and a downsampler 130b for the second channel. The results are downsampled parameters (DL) for the left channel and downsampled parameters (DR) for the right channel.

[0105] Then, both these data DL and DR are input into a joint scale parameter determiner 1200. The joint scale parameter determiner 1200 generates the first group of jointly encoded scale parameters such as mid or M scale parameters and a second group of jointly encoded scale parameters such as side or S scale parameters. Both groups are input in corresponding vector quantizers 140a, 140b to obtain quantized values that are then, in a final entropy encoder 140c and to be encoded to obtain the information on the jointly encoded scale parameters.

[0106] The entropy encoder 140c may be implemented to perform an arithmetic entropy encoding algorithm or an entropy encoding algorithm with a one-dimensional or with one or more dimensional Huffman code tables.

[0107] Another implementation of the encoder is illustrated in FIG. 3b, where the downsampling is not performed with the separate scale parameters such as with left and right as illustrated at 130a, 130b in FIG. 3a. Instead, the order of operations of the joint scale parameter determination and the subsequent downsampling by the corresponding downsamplers 130a, 130b is changed. Whether the implementation of FIG. 3a or FIG. 3b is used, depends on the certain implementation, where the implementation of FIG. 3a is of advantage, since the joint scale parameter determination 1200 is already performed on the downsampled scale parameters, i.e., the two different combination rules performed by the scale parameter calculator 140 are typically performed on a lower number of inputs compared to the case in FIG. 3b.

[0108] FIG. 4a illustrates the implementation of a decoder for decoding an encoded audio signal having multi-channel audio data comprising data for two or more audio channels and information on jointly encoded scale parameters. The decoder in FIG. 4a, however, is only part of the whole decoder of FIG. 8a, since only a part of the signal processor and, particularly, the corresponding channel scalers 212a. 212b are illustrated in FIG. 4a. With respect to the scale parameter decoder 220, this element comprises an entropy decoder 2200 reversing the procedure performed by corresponding block 140c in FIG. 3a. Furthermore, the entropy decoder outputs quantized jointly encoded scale parameters, such as quantized M scale parameters and quantized S scale parameters. The corresponding groups of scale parameters are input into dequantizers 2202 and 2204 in order to obtain dequantized values for M and S. These dequantized values are then input into a separate scale parameter determiner 2206 that outputs scale parameters for left and right, i.e., separate scale parameters. These corresponding scale parameters are input into interpolators 222a, 222b to obtain interpolated scale parameters for left (IL) and interpolated scale parameters for right (IR). Both of these data are input into a channel scaler 212a and 212b, respectively. Additionally, the channel scalers correspondingly receive the first channel representation subsequent to the whole procedure done by block 210 in FIG. 1, for example. Correspondingly, channel scaler 212b also obtains its corresponding second channel representation as output by block 210 in FIG. 1. Then, a final channel scaling or “shape spectrum” as it is named in FIG. 1 takes place to obtain a shaped spectral channel for left and right that are illustrated as “MDCT spectrum” in FIG. 1. Then, a final frequency domain to time domain conversion for each channel illustrated at 240a, 240b can be performed in order to finally obtain a decoded first channel and a decoded second channel of a multi-channel audio signal in a time domain representation.

[0109] Particularly, the scale parameter decoder 220 illustrated in the left portion of FIG. 4a can be included within an audio decoder as shown in FIG. 1 or as collectively shown in FIG. 4a, but can also be included as a local decoder within an encoder as will be shown with respect to FIG. 5 explicitly showing the local scale parameter decoder 220 at the output of the scale parameter encoder 140.

[0110] FIG. 4b illustrates a further implementation where, with respect to FIG. 4a, the order of interpolation and scale parameter determination to determine the separate scale parameters is exchanged. Particularly, the interpolation takes place with the jointly encoded scale parameters M and S using interpolators 222a, 222b of FIG. 4b, and the interpolated jointly encoded scale parameters such as IM and IS are input into the separate scale parameter determiner 2206. Then, the output of block 2206 are the upsampled scale parameters. i.e., the scale parameters for each of the, for example, 64 bands illustrated in FIG. 21.

[0111] FIG. 5 illustrates a further implementation of the encoder of FIG. 8b, FIG. 2 or FIG. 3a, FIG. 3b. The first channel and the second channel are both introduced into an optional time domain-to-frequency domain converter such as 100a, 100b of FIG. 5. The spectral representation output by blocks 100a, 100b is input into a channel scaler 120a that individually scales the spectral representation for the left and the right channel. Thus, the channel scaler 120a performs a shape spectrum operation illustrated in 120a of FIG. 2. The output of the channel scaler is input into a channel processor 120b of FIG. 5, and the processed channels output of the block 120b are input into the encoded signal former 1480, 1500 to obtain the encoded audio signal.

[0112] Furthermore, for the purpose of the determination of the separately or jointly encoded scale parameters, a similarity calculator 1400 is provided that receives, as an input, the first channel and the second channel directly in the time domain. Alternatively, the similarity calculator can receive the first channel and the second channel at the output of the time domain-to-frequency domain converters 100a, 100b, i.e., the spectral representation.

[0113] Although it will be outlined with respect to FIG. 6 that the similarity between the two channels is calculated based on the second group of jointly encoded scale parameters, i.e., based on the side scale parameters, it is to be noted that this similarity can also be calculated based on the time domain or spectral domain channels directly without explicit calculation of the jointly encoded scale parameters. Alternatively, the similarity can also be determined based on the first group of jointly encoded scale parameters, i.e., based on the mid-scale parameters. Particularly, when the energy of the side scale parameters is lower than a threshold, then it is determined that jointly encoding can be performed. Analogously, the energy of the mid-scale parameters in a frame can also be measured, and determination for a joint encoding can be done when the energy of the mid-scale parameters is greater than another threshold, for example. Thus, many different ways for determining the similarity between the first channel and the second channel can be implemented in order to decide for joint coding of scale parameters or separate coding of scale parameters. Nevertheless, it is to be mentioned that the determination for joint or separate coding of scale parameters does not necessarily have to be identical to the determination of joint stereo coding for the channels, i.e., whether two channels are jointly coded using a mid/side representation or are separately coded in a L, R representation. The determination of joint encoding of the scale parameters is done independent on the determination of stereo processing for the actual channels, since the determination of any kind of stereo processing performed in block 120b in FIG. 2 is done after and subsequent to a scaling or shaping of the spectrum using scale factors for mid and side. Particularly, as illustrated in FIG. 2, block 140 can determine a joint coding. Thus, as illustrated by the arrow in FIG. 2 pointing to block 140, the scale factors for M and S can occur within this block. In case of the application of a local scale parameter decoder 220 within the encoder of FIG. 5, then the actually used scale parameters for shaping the spectrum, although being scale parameters for left and scale parameters for right are nevertheless derived from the encoded and decoded scale parameters for mid and side.

[0114] With respect to FIG. 5, a mode decider 1402 is provided. The mode decider 1402 receives the output of the similarity calculator 1400 and decides for a separate coding of the scale parameters when the channels are not sufficiently similar. When, however, it is determined that the channels are similar, then a joint coding of the scale parameters is determined by block 1402, and the information, whether the separate or the change joint coding of the scale parameters is applied, is signaled by a corresponding side information or flag 1403 illustrated in FIG. 5 that is provided from block 1402 to the encoded signal former 1480, 1500. Furthermore, the encoder comprises the scale parameter encoder 140 that receives the scale parameters for the first channel and the scale parameters for the second channel and encodes the scale parameters either separately or jointly as controlled by the mode decider 1402. The scale parameter encoder 140 may, in one embodiment, output the scale parameters for the first and the second channel as indicated by the broken lines so that the channel scaler 120a performs a scaling with the corresponding first and second channel scale parameters. However, it is of advantage to apply a local scale parameter decoder 220 within the encoder so that the channel scaling takes place with the locally encoded and decoded scale parameters so that the dequantized scale parameters are applied for a channel scaling in the encoder. This has the advantage that exactly the same situation takes place within the channel scaler in the encoder and the decoder at least with respect to the used scale parameters for channel scaling or spectrum shaping.

[0115] FIG. 6 illustrates a further embodiment of the present invention with respect to the audio encoder. An MDCT spectrum calculator 100 is provided that can, for example, be a time domain to frequency domain converter applying an MDCT algorithm. Furthermore, a power spectrum calculator 110a is provided as illustrated in FIG. 2. The separate scale parameters are calculated by a corresponding calculator 1100, and for the purpose of calculating the jointly encoded scale parameters, an addition block 1200a and a subtraction block 1200b. Then, for the purpose of determining the similarity, an energy calculation per frame with the side parameters, i.e., the second group of jointly encoded scale parameters is performed. In block 1406, a comparison to a threshold is performed and this block being similar to the mode decider 1402 for the frame of FIG. 5 outputs the mode flag or stereo flag for the corresponding frame. Additionally, the information is given to the controllable encoder that performs a separate or joint coding in the current frame. To this end, the controllable encoder 140 receives the scale parameters calculated by a block 1100. i.e., the separate scale parameters and, additionally, receives the jointly encoded scale parameters, i.e., the ones determined by block 1200a and 1200b.

[0116] Block 140 advantageously generates a zero flag for the frame, when block 140 determines that all side parameters of a frame are quantized to 0. This result will occur when the first and the second channel are very close to each other and the differences between the channels and, therefore, the differences between the scale factors are so that these differences are smaller than the lowest quantization threshold applied by the quantizer included in block 140. Block 140 outputs the information on the jointly encoded or separately encoded scale parameters for the corresponding frame.

[0117] FIG. 9a illustrates an audio quantizer for quantizing a plurality of audio information items. The audio quantizer comprises a first stage vector quantizer 141, 143 for quantizing the plurality of audio information items such as scale factors or scale parameters or spectral values, etc. to determine a first stage vector quantization result 146. Additionally, block 141, 143 generates a plurality of intermediate quantized items corresponding to the first stage vector quantization result. The intermediate quantized items are, for example, the values associated with the first stage result. When the first stage result identifies a certain codebook with, for example, 16 certain (quantized) values, then the intermediate quantized items are the 16 values associated to the codebook vector index being the first stage result 146. The intermediate quantized items and the audio information items at the input into the first stage vector quantizer 141, 143 are input into a residual item determiner for calculating a plurality of residual items from the plurality of intermediate quantized items and the plurality of audio information items. This is e.g. done by calculating a difference for each item between the original item and the quantized item. The residual items are input into a second stage vector quantizer 145 for quantizing the plurality of residual items to obtain the second stage vector quantization result. Then, the first stage vector quantization result at the output of block 141, 143 and the second stage result at the output of block 145 together represent the quantized representation of the plurality of audio information items that is encoded by an optional encoded signal former 1480, 1500 that outputs the quantized audio information items that are, in the embodiment, not only quantized but are additionally entropy encoded.

[0118] A corresponding audio dequantizer is illustrated in FIG. 9b. The audio dequantizer comprises a first stage vector dequantizer 2220 for dequantizing a first stage quantization result included in the quantized plurality of audio information items to obtain a plurality of intermediate quantized audio information items. Furthermore, a second stage vector dequantizer 2260 is provided and is configured for dequantizing a second stage vector quantization result included in the quantized plurality of audio information items to obtain a plurality of residual items. Both, the intermediate items from block 2220 and the residual items from block 2260 are combined by a combiner 2240 for combining the plurality of intermediate quantized audio items and the plurality of residual items to obtain a dequantized plurality of audio information items. Particularly, the intermediate quantized items at the output of block 2220 are separately encoded scale parameters such as for L and R or the first group of the jointly encoded scale parameters e.g. for M, and the residual items may represent the jointly encoded side scale parameters, for example, i.e., the second group of jointly encoded scale parameters.

[0119] FIG. 7a illustrates an implementation of the first stage vector quantizer 141, 143 of FIG. 9a. In step 701, a vector quantization of a first subset of scale parameters is performed to obtain a first quantization index. In a step 702, a vector quantization of a second subset of scale parameters is performed to obtain a second quantization index. Furthermore, dependent on the implementation, a vector quantization of a third subset of scale parameters is performed as illustrated in block 703 to obtain a third quantization index that is an optional index. The procedure in FIG. 7a is applied when there is a split level quantization. Exemplarily, the audio input signal is separated into 64 bands illustrated in FIG. 21. These 64 bands are downsampled to 16 bands/scale factors, so that the whole band is covered by 16 scale factors. These 16 scale factors are quantized by the first stage vector quantizer 141, 143 in a split-level mode illustrated in FIG. 7a. The first 8 scale factors of the 16 scale factors of FIG. 21 that are obtained by downsampling the original 64 scale factors are vector-quantized by step 701 and, therefore, represent the first subset of scale parameters. The remaining 8 scale parameters for the 8 upper bands represent the second subset of scale parameters that are vector-quantized in step 702. Dependent on the implementation, a separation of the whole set of scale parameters or audio information items does not necessarily have to be done in exactly two subsets, but can also be done in three subsets or even more subsets.

[0120] Independent on how many splits are performed, the indexes for each level together represent the first stage result. As discussed with respect to FIG. 14, these indexes can be combined via an index combiner in FIG. 14 to have a single first stage index. Alternatively, the first stage result can consist of the first index, and the second index and a potential third index and probably even more indexes that are not combined, but that are entropy encoded as they are.

[0121] In addition to the corresponding indexes forming the first stage result, step 701, 702, 703 also provide the intermediate scale parameters that are used in block 704 for the purpose of calculating the residual scale parameters for the frame. Hence, step 705 that is performed by, for example, block 142 of FIG. 9a, results in the residual scale parameters that are then processed by an (algebraic) vector quantization performed by step 705 in order to generate the second stage result. Thus, the first stage result and the second stage result are generated for the separate scale parameters L, the separate scale parameters R and the first group of joint scale parameters M. However, as illustrated in FIG. 7b, the (algebraic) vector quantization of the second group of jointly coded scale parameters or side scale parameters is only performed by step 706 that is in an implementation identical to step 705 and is performed again by block 142 of FIG. 9a.

[0122] In a further embodiment, the information on the jointly encoded scale parameters for one of the two groups such as the second group advantageously related to the side scale parameters does not comprise quantization indices or other quantization bits but only information such as a flag or single bit indicating that the scale parameters for the second group are all zero for a portion or frame of the audio signal or are all at a certain value such as a small value. This information is determined by the encoder by an analysis or by other means and is used by the decoder to synthesize the second group of scale parameters based on this information such as by generating zero scale parameters for the time portion or frame of the audio signal or by generating certain value scale parameters or by generating small random scale parameters all being e.g. smaller than the smallest or first quantization stage or is used by the decoder to calculate the first and the second set of scale parameters only using the first group of jointly encoded scale parameters. Hence, instead of performing stage 705 in FIG. 7a, only the all zero flag for the second group of jointly encoded scale parameters is written as the second stage result. The calculation in block 704 can be omitted as well in this case and can be replaced by a decider for deciding whether the all zero flag is to be activated and transmitted or not. This decider can be controlled by a user input indicating a skip of the coding of the S parameters altogether or a bitrate information or can actually perform an analysis of the residual items. Hence, for the frame having the all zero bit, the scale parameter decoder does not perform any combination but calculates the second set of scale parameters only using the first group of jointly encoded scale parameters such as by dividing the encoded scale parameters of the first group by two or by weighting using another predetermined value.

[0123] In a further embodiment, the second group of jointly encoded scale parameters is quantized only using the second quantization stage of the two stage quantizer, which may be a variable rate quantizer stage. In this case, it is assumed that the first stage results in all zero quantized values, so that only the second stage is effective. This case is illustrated in FIG. 7b.

[0124] In an even further embodiment, only the first quantization stage such as 701, 702, 703 of the two stage quantizer in FIG. 7a, which may be a fixed rate quantization stage, is applied and the second stage 705 is not used at all for a time portion or frame of the audio signal. This case corresponds to a situation, where all the residual items are assumed to be zero or smaller than the smallest or first quantization step size of the second quantization stage. Then, FIG. 7b, item 706 would correspond to items 701, 702, 703 of FIG. 7a and item 704 could be omitted as well and can be replaced by a decider for deciding that only the first stage quantization is used or not. This decider can be controlled by a user input or a bitrate information or can actually perform an analysis of the residual items to determine that the residual items are small enough so that the accuracy of the second group of jointly encoded scale parameters quantized by the single stage only is sufficient.

[0125] In an implementation of the present invention that is additionally illustrated in FIG. 14, the algebraic vector quantizer 145 additionally performs a split level calculation and, advantageously, performs the same split level operation as is performed by the vector quantizer. Thus, the subsets of the residual values correspond, with respect to the band number, to the subset of scale parameters. For the case of having two split levels, i.e., for the first 8 downsampled bands of FIG. 21, the algebraic vector quantizer 145 generates the first level result. Furthermore, the algebraic vector quantizer 145 generates a second level result for the upper 8 downsampled scale factors or scale parameters or, generally, audio information items.

[0126] Advantageously, the algebraic vector quantizer 145 is implemented as the algebraic vector quantizer defined in section 5.2.3.1.6.9 of ETSI TS 126 445 V13.2.0 (2016-08) mentioned as reference (4) where, the result of the corresponding split multi-rate lattice vector quantization is a codebook number for each 8 items, a vector index in the base codebook and an 8-dimensional Voronoi index. However, in case of only having a single codebook, the codebook number can be avoided and only the vector index in the base codebook and the corresponding n-dimensional Voronoi index is sufficient. Thus, these items which are item a, item b and item c or only item b and item c for each level for the algebraic vector quantization result represent the second stage quantization result.

[0127] Subsequently, reference is made to FIG. 10 illustrating a corresponding decoding operation matching with the encoding of FIG. 7a. 7b or the encoding of FIG. 14 in accordance with the first or the second aspect of the present invention or in accordance with both aspects.

[0128] In step 2221 of FIG. 10, the quantized mid scale factors, i.e., the second group of jointly encoded scale factors are retrieved. This is done when the stereo mode flag or item 1403 of FIG. 5 indicates a true value. Then, a first stage decoding 2223 and a second stage decoding 2261 is performed in order to re-do the procedures done by the encoder of FIG. 14 and, particularly, by the algebraic vector quantizer 145 described with respect to FIG. 14 or described with respect to FIG. 7a. In step 2225, it is assumed that the side scale factors are all 0. In step 2261, it is checked by means of the 0 flag value, whether there actually come non-zero quantized scale factors for the frame. In case the 0 flag value indicates that there are non-zero side scale factors for the frame, then the quantized side scale factors are retrieved and decoded using the second stage decoding 2261 or performing block 706 of FIG. 7b only. In block 2207, the jointly encoded scale parameters are transformed back to the separately encoded scale parameters in order to then output the quantized left and right scale parameters that can then be used for inverse scaling of the spectrum in the decoder.

[0129] When the stereo mode flag value indicates a value of zero or when it is determined that a separate coding has been used within the frame, then only first stage decoding 2223 and second stage decoding 2261 is performed for the left and right scale factors and, since the left and right scale factors are already in the separately encoded representation, any transformation such as block 2207 is not required. The process of efficiently coding and decoding the SNS scale factors that are needed for scaling the spectrum before the stereo processing at the encoder side and after the inverse stereo processing in the decoder side is described below to show an advantageous implementation of the present invention as an exemplary pseudo code with comments.

TABLE-US-00001 Joint quantization and coding of scale factors Compute side from the M scale factors of each channel snsl and snsr and compute the total energy of side ener_side. ener_side=0; for (i = 0; i < M; i++) { side[i] = snsl[i] − snsr[i]; ener_side = ener_side + side[i]{circumflex over ( )}2; } If ener_side is lower than a certain threshold, the two signals are highly correlated and coding should be done jointly else independently. if (ener_side < threshold ) code scale factors jointly { Signal MS coding to bitstream Compute mid from the M scale factors of each channel snsl and snsR for (i = 0; i < M; i++) { mid[i] = (snsl[i] + snsr[i]) * 0.5f; } Quantize mid with first stage vector quantization (VQ), function returns the index of the stochastic codebook indexl_1 and the intermediate quantized mid parameters mid_q. indexl_1 = sns_1st_cod( mid, mid_q ); Quantize mid with second stage algebraic vector quantization (AVQ), function returns indices of split dimensions and the final quantized mid mid_q. indexl_2 = sns_2st_cod( mid, mid_q ); Quantize side - assume coarse quantization and set all ″quantized″ parameters to zero. for ( i = 0; i < M; i++ ) { side_q[i] = 0.f; } Quantize side with second stage algebraic vector quantization (AVQ), function returns indices of split dimensions and the final quantized side side_q. indexr_2 = sns_2st_cod( side, side_q); Detect whether quantized scale factors are zero, if so signal it to the bitstream with a bit if ( flag_zero ) { send signal bit to bitstream } Transform quantized scale factors back to L - R representation for (i = 0; i < M; i++) { snsl_q[i] = mid_q[i] + side_q[i] * 0.5f; snsr_q[i] = mid_q[i] − side_q[i] * 0.5f; } } else code scale factors independently { Signal LR coding to bitstream Quantize left channel scale factors with first stage vector quantization (VQ), function returns the index of the stochastic codebook indexl_1 and the quantized snsl parameters snsl_q indexl_1 = sns_1st_cod( snsl, snsl_q ); Quantize left channel scale factors with second stage algebraic vector quantization (AVQ), function returns indices of split dimensions and the final quantized snsl snsl_q indexl_2 = sns_2st_cod( snsl, snsl_q ); Quantize right channel scale factors with firs stage vector quantization (VQ), function returns the index of the stochastic codebook indexr_1 and the quantized snsr parameters snsr_q indexr_1 = sns_1st_cod( snsr, snsr_q ); Quantize right channel scale factors with second stage algebraic vector quantization (AVQ), function returns indices of split dimensions and the final quantized snsr snsr_q indexr_2 = sns_2st_cod( snsr, snsr_q ); } Output quantized SNS scale factors snsl_q and snsr_q to perform the scaling of the spectrum.

[0130] Any sort of quantization e. g. uniform or non-uniform scalar quantization and entropy or arithmetic coding can be used to represent the parameters. In the described implementation, as can be seen in the algorithm description, a 2-stage vector quantization scheme is implemented: [0131] First stage: 2 splits (8 dimension each) with 5 bits each, therefore, coded with 10 bits [0132] Second stage: algebraic vector quantization (AVQ), again 2-split with scaling of the residual, where codebook indices are entropy coded and therefore, uses variable bitrate.

[0133] Since the side signal for highly correlated channels can be considered small, using the e.g. reduced-scale 2.sup.nd stage AVQ only is sufficient to represent the corresponding SNS parameters. By skipping the 1.sup.st stage VQ for these signals, a significant complexity and bit saving for coding of the SNS parameters can be achieved.

[0134] A pseudo code description of each stage of quantization implemented is given below. First stage with 2-split vector quantization using 5 bits for each split:

TABLE-US-00002 codebook index = sns_1st_cod( input : sns parameters vector to quantize output : sns_q quantized sns scale parameter ) { split vector of coefficients to half j0 = 0; j1 = M / 2; initialize minimum distance dist_min = 1.0e30f; pointer to memory location of stored codebooks p = sns_vq_cdbk1; index0 = 0; Split Vector Quantization Use 5-bit representation 32=2{circumflex over ( )}5 to find the optimal index with the minimum distance for ( i = 0; i < 32; i++ ) { dist = 0.0; for ( j = j0; j < j1; j++ ) { get difference of sns parameters with each one of the 8-dimensional quantized vectors, that are sequentially stored in memory. temp = sns [j] − *p++; calculate distance dist = dist + temp * temp; } return index of codebook with minimum distance if ( dist < dist_min ) { dist_min = dist; index0 = i; } } Having found the optimal index of the vector get quantized values of the M/2 first SNS scale factors from codebook Point to the address in memory to the selected codebook p = &sns_vq_cdbk1[index0 * ( M / 2 )]; for ( j = j0; j < j1; j++ ) { snsq[j] = *p++; Increment pointer by one } Repeat the procedure for the second split of the vector j0 = M / 2; j1 = M; dist_min = 1.0e30f; p = sns_vq_cdbk2; index1 = 0; for ( i = 0; i < 32; i++ ) { dist = 0.0; for ( j = j0; j < j1; j++ ) { temp = sns[j] − *p_dico++; dist += temp * temp; } if ( dist < dist_min ) { dist_min = dist; index1 = i; } } Get the quantized values for the remaining factors from codebook p = &sns_vq_cdbk2[index1 * ( M / 2 )]; for ( j = j0; j < j1; j++ ) { snsq[j] = *p++; } Final index is the sum of the indices from first split + seconds split multiplied with the factor of 2{circumflex over ( )}5=32. Therefore, only one index needs to be multiplexed in the bitstream index = index0 + ( index1 << 5 ); return index; }

[0135] Second Stage Algebraic Vector Quantization:

TABLE-US-00003 sns_2st_cod( input sns, normalized vector to quantize input/output snsq, i:1st stage o:1st+2nd stage output index[], ) { scale = 1.0 / 2.5; Compute residual from first-stage quantization and scale residual for finer quantization for ( i = 0; i < M; i++ ) { x[i] = ( sns[i] − snsq[i] ) / scale; } Quantize residual using AVQ (Algebraic code vector) used in EVS for the second- stage quantization of the LPC coefficients [4], Where x is the residual, xq is the quantized residual returned from the function, 2 marks the 2-split process and indx is an array that contains the indices of the codebooks for each split AVQ_cod_lpc( x, xq, indx, 2 ); Refine the quantized SNS scale factors by adding the quantized residual concluding the second stage of quantization for ( i = 0; i < M; i++ ) { snsq[i] = snsq[i] + scale * xq[i]; } }

[0136] The indices that are output from the coding process are finally packed to the bitstream and sent to the decoder.

[0137] The AVQ procedure disclosed above for the second stage may be implemented as outlined in EVS referring to is the High-Rate LPC (subclause 5.3.3.2.1.3) in the MDCT-based TCX chapter. Specifically for the second-stage Algebraic vector quantizer used it is stated 5.3.3.2.1.3.4 Algebraic vector quantizer, and the algebraic VQ used for quantizing the refinement is described in subclause 5.2.3.1.6.9. In an embodiment, one has, for each index, a set of codewords for the base codebook index and set of codewords for the Voronoi index, and all this is entropy coded and therefore of variable bit rate. Hence, the parameters of the AVQ in each sub-band j consist of the codebook number, the vector index in base codebook and the n-(such as 8-) dimensional Voronoi index.

[0138] Decoding of Scale Factors

[0139] At the decoder end the indices are extracted from the bitstream and are used to decode and derive the quantized values of the scale factors. A pseudo code example of the procedure is given below.

[0140] The procedure of the 2-stage decoding is described in detail in the pseudocode below.

TABLE-US-00004 Read bit signaling stereo coding from bitstream if ( stereo_mode is true ) { Read indices to retrieve quantized mid scale factors. First stage decoding, input indexl_1 and return quantized mid, mid_q sns_1st_dec( indexl_1, mid_q ); Second stage decoding, input indeces indexl_2 and return final quantized mid_q sns_2st_dec( mid_q, indexl_2 ); Assume quantized side scale factors are zero after first stage for (i=0; i<M; i++) { side_q[i] = 0.f; } If it is signaled in bitstream that side scale factors are non-zero do second stage decoding if ( flag_zero is false ) { Input second-stage indices indexr_2 and return quantized side, side_q sns_2st_dec( side_q, indexr_2 ); } Transform mid-side SNS quantized scale factors to L-R for (i = 0; i < M; i++) { SNS_Ql[i] = mid_q[i] + side_q[i] * 0.5f; SNS_Qr[i] = mid_q[i] − side_q[i] * 0.5f; } } else { Two stage decoding to retrieve the L-R SNS quantized scale factors First stage decoding L sns_1st_dec( indexl_1, SNS_Ql ); Second stage decoding L sns_2st_dec( SNS_Ql, indexl ); First stage decoding R sns_1st_dec( *indexr++, SNS_Qr ); Second stage decoding R [1] sns_2st_dec( SNS_Qr, indexr ); } Return quantized scale factors for each channel to scale the decoded spectrum }

[0141] The procedure of the 2-stage decoding is described in detail in the pseudocode below.

TABLE-US-00005 sns_1st_dec( input: index, codebook index output snsq, quantized sns ) { To retrieve index0 and index1 representing the indices for each split from index the inverse operation need to be done: index0= index%32; where % represents the reminder from dividing with 32 index1=index/32; Pointer to first codebook for the first half of quantized SNS parameteres p = &sns_vq_cdbk1[( index0 ) * ( M / 2 )]; Retrieve vector of quantized values sequentially stored in memory for ( i = 0; i < M / 2; i++ ) { snsq[i] = *p++; } Pointer to second codebook to retrieve the second half of SNS parameters p = &sns_vq_cdbk2[( index1 ) * ( M / 2 )]; Retrieve vector of quantized values sequentially stored in memory for ( i = M / 2; i < M; i++ ) { snsq[i] = *p++; } }

[0142] The quantized SNS scale factors retrieved from the first stage are refined by decoding the residual in the second stage. The procedure is given in the pseudocode below:

TABLE-US-00006 sns_2st_dec( input/output snsq, i:1st stage o:1st+2nd stage input indx, i: index[] (4 bits per words) ) { float scale = 1.0 / 2.5; Derive from indices indx the quantized M residuals xq, from the 2-split AVQ decoding function. AVQ_dec_lpc( indx, xq, 2 ); Reconstruct the final quantized SNS parameters by adding the scaled residuals for ( i = 0; i < M; i++ ) { snsq[i] = snsq[i] + scale * (float) xq[i]; } }

[0143] Regarding scaling or amplification/weighting of the residual on the encoder side and scaling or attenuation/weighting on the decoder side, the weighting factors are not calculated separately for each value or split but a single weight or a small number of different weight (as an approximation to avoid complexity) are used to scale all the parameters. This scaling is a factor that determines the trade-off of e.g. coarse quantization (more quantizations to zero) bitrate savings and quantization precision (with respective spectral distortion), and can be predetermined in the encoder so that this predetermined value does not have to be transmitted to the decoder but can be fixedly set or initialized in the decoder to save transmission bits. Therefore, a higher scaling of the residual would entail more bits but have minimal spectral distortion, while reducing the scale would save additional bits and if spectral distortion is kept in an acceptable range, that could serve as a means of additional bitrate saving.

Advantages of Embodiments

[0144] Substantial bit savings when two channels are correlated and SNS parameters are coded jointly. [0145] An example of bits per frame savings achieved in the system described in the previous section are shown below: [0146] Independent: 88.1 bits on average [0147] New-independent: 72.0 bits on average [0148] New-joint: 52.1 bits on average [0149] where [0150] “Independent” is the MDCT stereo implementation described in [8] using SNS [6] for the FDNS coding the two channels only independently with 2 stage VQ [0151] First stage: 8-bit trained codebook (16 dimension) [0152] Second stage: AVQ of the residual scaled with a factor of 4 (variable bitrate) [0153] “New-independent” refers on the previously described embodiment of the invention where correlation of the two channels is not high enough and they are coded separately, using a new VQ 2-stage approach as described above and residual is scaled with a reduced factor of 2.5 [0154] “New-joint” refers to the jointly coded case (also described above), where again in the second stage the residual is scaled with a reduced factor of 2.5. [0155] Another advantage of the proposed method is computational complexity savings. As shown in [6] the new SNS is more optimal in terms of computational complexity from the LPC-based FDNS described in [5] due the autocorrelation computations that are needed to estimate the LPCs. Therefore, when comparing the computational complexity of the MDCT-based stereo system from [8] where improved LPC-based FDNS [5] is used to an implementation where the new SNS [6] replaces the LPC-based approach, there are savings of approx. 6 WMOPS at 32 kHz sampling rate. [0156] Additionally, the new two-stage quantization with VQ for the first stage and AVQ with reduced scale for the second stage achieves some further reduction of computational complexity. For the embodiment described in the previous section computational complexity is reduced further by approx. 1 WMOPS at 32 kHz sampling rate, with the trade-off of acceptable spectral distortion.

Summary of Embodiments or Aspects

[0157] 1. Joint coding of spectral noise shaping parameters, where mid/side representation of the parameters is calculated and mid is coded using quantization and entropy coding and side is coded using a coarser quantization scheme. [0158] 2. Adaptively determine whether noise shaping parameters should be coded independently or jointly based on channel correlation or coherence. [0159] 3. Signaling bit sent to determine whether parameters where coded independently or jointly. [0160] 4. Applications based on the MDCT stereo implementation: [0161] signaling with bits where side coefficients are zero [0162] that the SNS is used [0163] that the power spectrum is used for calculating SNS [0164] that 2 splits with 5 bits is used in the first stage. [0165] Adjusting the scaling of the residual of the second stage AVQ may further reduce the number of bits for the second stage quantization.

[0166] FIG. 23 illustrates a comparison in the number of bits for both channels in line with a current known implementation (described as “independent” above), the new independent implementation in accordance with the second aspect of the present invention and for the new joint implementation in accordance with the first aspect of the present invention. FIG. 23 illustrates a histogram where the vertical axis represents the frequency of occurrence and the horizontal axis illustrates the bins of total number of bits for coding the parameters for both channels.

[0167] Subsequently, further embodiments are illustrated where a specific emphasis is given to the calculation of the scale factors for each audio channel and where additionally specific emphasis is given to the specific application of downsampling and upsampling of the scale parameters, which is applied either before or subsequent to the calculation of the jointly encoded scale parameters as illustrated with respect to FIG. 3a, FIG. 3b.

[0168] FIG. 11 illustrates an apparatus for encoding an audio signal 160. The audio signal 160 may be available in the time-domain, although other representations of the audio signal such as a prediction-domain or any other domain would principally also be useful. The apparatus comprises a converter 100, a scale factor calculator 110, a spectral processor 120, a downsampler 130, a scale factor encoder 140 and an output interface 150. The converter 100 is configured for converting the audio signal 160 into a spectral representation. The scale factor calculator 110 is configured for calculating a first set of scale parameters or scale factors from the spectral representation. The other channel is received at block 120, and the scale parameters from the other channels are received by block 140.

[0169] Throughout the specification, the term “scale factor” or “scale parameter” is used in order to refer to the same parameter or value, i.e., a value or parameter that is, subsequent to some processing, used for weighting some kind of spectral values. This weighting, when performed in the linear domain is actually a multiplying operation with a scaling factor. However, when the weighting is performed in a logarithmic domain, then the weighting operation with a scale factor is done by an actual addition or subtraction operation. Thus, in the terms of the present application, scaling does not only mean multiplying or dividing but also means, depending on the certain domain, addition or subtraction or, generally means each operation, by which the spectral value, for example, is weighted or modified using the scale factor or scale parameter.

[0170] The downsampler 130 is configured for downsampling the first set of scale parameters to obtain a second set of scale parameters, wherein a second number of the scale parameters in the second set of scale parameters is lower than a first number of scale parameters in the first set of scale parameters. This is also outlined in the box in FIG. 11 stating that the second number is lower than the first number. As illustrated in FIG. 11, the scale factor encoder is configured for generating an encoded representation of the second set of scale factors, and this encoded representation is forwarded to the output interface 150. Due to the fact that the second set of scale factors has a lower number of scale factors than the first set of scale factors, the bitrate for transmitting or storing the encoded representation of the second set of scale factors is lower compared to a situation, in which the downsampling of the scale factors performed in the downsampler 130 would not have been performed.

[0171] Furthermore, the spectral processor 120 is configured for processing the spectral representation output by the converter 100 in FIG. 11 using a third set of scale parameters, the third set of scale parameters or scale factors having a third number of scale factors being greater than the second number of scale factors, wherein the spectral processor 120 is configured to use, for the purpose of spectral processing the first set of scale factors as already available from block 110 via line 171. Alternatively, the spectral processor 120 is configured to use the second set of scale factors as output by the downsampler 130 for the calculation of the third set of scale factors as illustrated by line 172. In a further implementation, the spectral processor 120 uses the encoded representation output by the scale factor/parameter encoder 140 for the purpose of calculating the third set of scale factors as illustrated by line 173 in FIG. 11. Advantageously, the spectral processor 120 does not use the first set of scale factors, but uses either the second set of scale factors as calculated by the downsampler or even more advantageously uses the encoded representation or, generally, the quantized second set of scale factors and, then, performs an interpolation operation to interpolate the quantized second set of spectral parameters to obtain the third set of scale parameters that has a higher number of scale parameters due to the interpolation operation.

[0172] Thus, the encoded representation of the second set of scale factors that is output by block 140 either comprises a codebook index for a advantageously used scale parameter codebook or a set of corresponding codebook indices. In other embodiments, the encoded representation comprises the quantized scale parameters of quantized scale factors that are obtained, when the codebook index or the set of codebook indices or, generally, the encoded representation is input into a decoder-side vector decoder or any other decoder.

[0173] Advantageously, the spectral processor 120 uses the same set of scale factors that is also available at the decoder-side, i.e., uses the quantized second set of scale parameters together with an interpolation operation to finally obtain the third set of scale factors.

[0174] In an embodiment, the third number of scale factors in the third set of scale factors is equal to the first number of scale factors. However, a smaller number of scale factors is also useful. Exemplarily, for example, one could derive 64 scale factors in block 110, and one could then downsample the 64 scale factors to 16 scale factors for transmission. Then, one could perform an interpolation not necessarily to 64 scale factors, but to 32 scale factors in the spectral processor 120. Alternatively, one could perform an interpolation to an even higher number such as more than 64 scale factors as the case may be, as long as the number of scale factors transmitted in the encoded output signal 170 is smaller than the number of scale factors calculated in block 110 or calculated and used in block 120 of FIG. 11.

[0175] Advantageously, the scale factor calculator 110 is configured to perform several operations illustrated in FIG. 12. These operations refer to a calculation 111 of an amplitude-related measure per band, where the spectral representation for one channel is input into block 111. The calculation for the other channel will take place in a similar manner. An advantageous amplitude-related measure per band is the energy per band, but other amplitude-related measures can be used as well, for example, the summation of the magnitudes of the amplitudes per band or the summation of squared amplitudes which corresponds to the energy. However, apart from the power of 2 used for calculating the energy per band, other powers such as a power of 3 that would reflect the loudness of the signal could also be used and, even powers different from integer numbers such as powers of 1.5 or 2.5 can be used as well in order to calculate amplitude-related measures per band. Even powers less than 1.0 can be used as long as it is made sure that values processed by such powers are positive-valued.

[0176] A further operation performed by the scale factor calculator can be an inter-band smoothing 112. This inter-band smoothing may be used to smooth out the possible instabilities that can appear in the vector of amplitude-related measures as obtained by step 111. If one would not perform this smoothing, these instabilities would be amplified when converted to a log-domain later as illustrated at 115, especially in spectral values where the energy is close to 0. However, in other embodiments, inter-band smoothing is not performed.

[0177] A further operation performed by the scale factor calculator 110 is the pre-emphasis operation 113. This pre-emphasis operation has a similar purpose as a pre-emphasis operation used in an LPC-based perceptual filter of the MDCT-based TCX processing as discussed before with respect to the known technology. This procedure increases the amplitude of the shaped spectrum in the low-frequencies that results in a reduced quantization noise in the low-frequencies.

[0178] However, depending on the implementation, the pre-emphasis operation—as the other specific operations—does not necessarily have to be performed.

[0179] A further optional processing operation is the noise-floor addition processing 114. This procedure improves the quality of signals containing very high spectral dynamics such as, for example, Glockenspiel, by limiting the amplitude amplification of the shaped spectrum in the valleys, which has the indirect effect of reducing the quantization noise in the peaks, at the cost of an increase of quantization noise in the valleys, where the quantization noise is anyway not perceptible due to masking properties of the human ear such as the absolute listening threshold, the pre-masking, the post-masking or the general masking threshold indicating that, typically, a quite low volume tone relatively close in frequency to a high volume tone is not perceptible at all, i.e., is fully masked or is only roughly perceived by the human hearing mechanism, so that this spectral contribution can be quantized quite coarsely.

[0180] The noise-floor addition operation 114, however, does not necessarily have to be performed.

[0181] Furthermore, block 115 indicates a log-like domain conversion. Advantageously, a transformation of an output of one of blocks 111, 112, 113, 114 in FIG. 12 is performed in a log-like domain. A log-like domain is a domain, in which values close to 0 are expanded and high values are compressed. Advantageously, the log domain is a domain with basis of 2, but other log domains can be used as well. However, a log domain with the basis of 2 is better for an implementation on a fixed-point signal processor.

[0182] The output of the scale factor calculator 110 is a first set of scale factors.

[0183] As illustrated in FIG. 12, each of the blocks 112 to 115 can be bridged, i.e., the output of block 111, for example, could already be the first set of scale factors. However, all the processing operations and, particularly, the log-like domain conversion are of advantage. Thus, one could even implement the scale factor calculator by only performing steps 111 and 115 without the procedures in steps 112 to 114, for example. At the output of block 115, a set of scale parameters for a channel (such as L) is obtained and a set of scale parameters for the other channel (such as R) can also be obtained by a similar calculation.

[0184] Thus, the scale factor calculator is configured for performing one or two or more of the procedures illustrated in FIG. 12 as indicated by the input/output lines connecting several blocks.

[0185] FIG. 13 illustrates an implementation of the downsampler 130 of FIG. 11 again for a single channel. The data for the other channel is calculated in a similar way. Advantageously, a low-pass filtering or, generally, a filtering with a certain window w(k) is performed in step 131, and, then, a downsampling/decimation operation of the result of the filtering is performed. Due to the fact that low-pass filtering 131 and in embodiments the downsampling/decimation operation 132 are both arithmetic operations, the filtering 131 and the downsampling 132 can be performed within a single operation as will be outlined later on. Advantageously, the downsampling/decimation operation is performed in such a way that an overlap among the individual groups of scale parameters of the first set of scale parameters is performed. Advantageously, an overlap of one scale factor in the filtering operation between two decimated calculated parameters is performed. Thus, step 131 performs a low-pass filter on the vector of scale parameters before decimation. This low-pass filter has a similar effect as the spreading function used in psychoacoustic models. It reduces the quantization noise at the peaks, at the cost of an increase of quantization noise around the peaks where it is anyway perceptually masked at least to a higher degree with respect to quantization noise at the peaks.

[0186] Furthermore, the downsampler additionally performs a mean value removal 133 and an additional scaling step 134. However, the low-pass filtering operation 131, the mean value removal step 133 and the scaling step 134 are only optional steps. Thus, the downsampler illustrated in FIG. 13 or illustrated in FIG. 11 can be implemented to only perform step 132 or to perform two steps illustrated in FIG. 13 such as step 132 and one of the steps 131, 133 and 134. Alternatively, the downsampler can perform all four steps or only three steps out of the four steps illustrated in FIG. 13 as long as the downsampling/decimation operation 132 is performed.

[0187] As outlined in FIG. 13, audio operations in FIG. 13 performed by the downsampler are performed in the log-like domain in order to obtain better results.

[0188] FIG. 15 illustrates an implementation of the spectral processor. The spectral processor 120 included within the encoder of FIG. 11 comprises an interpolator 121 that receives the quantized second set of scale parameters for each channel or alternatively for a group of jointly encoded scale parameters and that outputs the third set of scale parameters for a channel of for a group of jointly encoded scale parameters where the third number is greater than the second number and advantageously equal to the first number. Furthermore, the spectral processor comprises a linear domain converter 120. Then, a spectral shaping is performed in block 123 using the linear scale parameters on the one hand and the spectral representation on the other hand that is obtained by the converter 100. Advantageously, a subsequent temporal noise shaping operation, i.e., a prediction over frequency is performed in order to obtain spectral residual values at the output of block 124, while the TNS side information is forwarded to the output interface as indicated by arrow 129.

[0189] Finally, the spectral processor 125, 120b has at least one of a scalar quantizer/encoder that is configured for receiving a single global gain for the whole spectral representation, i.e., for a whole frame, and a stereo processing functionality and an IGF processing functionality, etc. Advantageously, the global gain is derived depending on certain bitrate considerations. Thus, the global gain is set so that the encoded representation of the spectral representation generated by block 125, 120b fulfils certain requirements such as a bitrate requirement, a quality requirement or both. The global gain can be iteratively calculated or can be calculated in a feed forward measure as the case may be. Generally, the global gain is used together with a quantizer and a high global gain typically results in a coarser quantization where a low global gain results in a finer quantization. Thus, in other words, a high global gain results in a higher quantization step size while a low global gain results in a smaller quantization step size when a fixed quantizer is obtained. However, other quantizers can be used as well together with the global gain functionality such as a quantizer that has some kind of compression functionality for high values, i.e., some kind of non-linear compression functionality so that, for example, the higher values are more compressed than lower values. The above dependency between the global gain and the quantization coarseness is valid, when the global gain is multiplied to the values before the quantization in the linear domain corresponding to an addition in the log domain. If, however, the global gain is applied by a division in the linear domain, or by a subtraction in the log domain, the dependency is the other way round. The same is true, when the “global gain” represents an inverse value.

[0190] Subsequently, implementations of the individual procedures described with respect to FIG. 11 to FIG. 15 are given.

[0191] Detailed Step-by-Step Description of Embodiments

[0192] Encoder: [0193] Step 1: Energy per band (111)

[0194] The energies per band E.sub.B(n) are computed as follows:

[00001] $E_{B} (b) = {.Math.}_{k = Ind (b)}^{Ind (b + 1) - 1} \frac{{X (k)}^{2}}{Ind (b + 1) - Ind (b)} for b = 0 .Math. N_{B} - 1$

with X(k) are the MDCT coefficients, N.sub.B=64 is the number of bands and Ind(n) are the band indices. The bands are non-uniform and follow the perceptually-relevant bark scale (smaller in low-frequencies, larger in high-frequencies). [0195] Step 2: Smoothing (112)

[0196] The energy per band E.sub.B(b) is smoothed using

[00002] $E_{S} (b) = {\begin{matrix} \begin{matrix} \begin{matrix} 0.75 .Math. E_{B} (0) + 0.25 .Math. E_{B} (1) \\ 0.25 .Math. E_{B} (62) + 0.75 .Math. E_{B} (63) \end{matrix} \\ 0.25 .Math. E_{B} (b - 1) + 0.5 .Math. E_{B} (b) + 0.25 .Math. E_{B} (b + 1) \end{matrix} & \begin{matrix} \begin{matrix} , if b = 0 \\ , if b = 63 \end{matrix} \\ , otherwise \end{matrix} \end{matrix}$

[0197] Remark: this step is mainly used to smooth the possible instabilities that can appear in the vector E.sub.B(b). If not smoothed, these instabilities are amplified when converted to log-domain (see step 5), especially in the valleys where the energy is close to 0. [0198] Step 3: Pre-emphasis (113)

[0199] The smoothed energy per band E.sub.S(b) is then pre-emphasized using

[00003] $E_{P} (b) = E_{S} (b) .Math. 10^{\frac{b .Math. g_{tilt}}{10 .Math. 63}} for b = 0 .Math. 63$

with g.sub.tilt controls the pre-emphasis tilt and depends on the sampling frequency. It is for example 18 at 16 kHz and 30 at 48 kHz. The pre-emphasis used in this step has the same purpose as the pre-emphasis used in the LPC-based perceptual filter of known technology 2, it increases the amplitude of the shaped Spectrum in the low-frequencies, resulting in reduced quantization noise in the low-frequencies. [0200] Step 4: Noise floor (114)

[0201] A noise floor at −40 dB is added to E.sub.P(b) using

E.sub.P(b)=max(E.sub.P(b),noiseFloor) for b=0 . . . 63

with the noise floor being calculated by

[00004] $noiseFloor = \max (\frac{{.Math.}_{b = 0}^{63} E_{P} (b)}{64} .Math. 10^{- \frac{40}{10}}, 2^{- 32})$

[0202] This step improves quality of signals containing very high spectral dynamics such as e.g. glockenspiel, by limiting the amplitude amplification of the shaped spectrum in the valleys, which has the indirect effect of reducing the quantization noise in the peaks, at the cost of an increase of quantization noise in the valleys where it is anyway not perceptible. [0203] Step 5: Logarithm (115)

[0204] A transformation into the logarithm domain is then performed using

[00005] $E_{L} (b) = \frac{\log_{2} (E_{P} (b))}{2} for b = 0 .Math. 63$ [0205] Step 6: Downsampling (131, 132)

[0206] The vector E.sub.L(b) is then downsampled by a factor of 4 using

[00006] $E_{4} (b) = {\begin{matrix} w (0) E_{L} (0) + {.Math.}_{k = 1}^{5} w (k) E_{L} (4 b + k - 1) & , if b = 0 \\ {.Math.}_{k = 0}^{4} w (k) E_{L} (4 b + k - 1) + w (5) E_{L} (63) & , if b = 15 \\ {.Math.}_{k = 0}^{5} w (k) E_{L} (4 b + k - 1) & , otherwise \end{matrix}$ $With$ $w (k) = {\frac{1}{12}, \frac{2}{12}, \frac{3}{12}, \frac{3}{12}, \frac{2}{12}, \frac{1}{12}}$

[0207] This step applies a low-pass filter (w(k)) on the vector E.sub.L(b) before decimation. This low-pass filter has a similar effect as the spreading function used in psychoacoustic models: it reduces the quantization noise at the peaks, at the cost of an increase of quantization noise around the peaks where it is anyway perceptually masked. [0208] Step 7: Mean Removal and Scaling (133, 134)

[0209] The final scale factors are obtained after mean removal and scaling by a factor of 0.85

[00007] $scf (n) = 0.85 (E_{4} (n) - \frac{{.Math.}_{b = 0}^{15} E_{4} (b)}{16}) for n = 0 .Math. 15$

[0210] Since the codec has an additional global-gain, the mean can be removed without any loss of information. Removing the mean also allows more efficient vector quantization. The scaling of 0.85 slightly compress the amplitude of the noise shaping curve. It has a similar perceptual effect as the spreading function mentioned in Step 6: reduced quantization noise at the peaks and increased quantization noise in the valleys. [0211] Step 8: Quantization (141, 142)

[0212] The scale factors are quantized using vector quantization, producing indices which are then packed into the bitstream and sent to the decoder, and quantized scale factors scfQ(n). [0213] Step 9: Interpolation (121, 122)

[0214] The quantized scale factors scfQ(n) are interpolated using

scfQint(0)=scfQ(0)

scfQint(1)=scfQ(0)

scfQint(4n+2)=scfQ(n)+⅛(scfQ(n+1)−scfQ(n)) for n=0 . . . 14

scfQint(4n+3)=scfQ(n)+⅜(scfQ(n+1)−scfQ(n)) for n=0 . . . 14

scfQint(4n+4)=scfQ(n)+⅝(scfQ(n+1)−scfQ(n)) for n=0 . . . 14

scfQint(4n+5)scfQ(n)+⅞(scfQ(n+1)−scfQ(n)) for n=0 . . . 14

scfQint(62)=scfQ(15)+⅛(scfQ(15)−scfQ(14))

scfQint(63)=scfQ(15)+0.8(scfQ(15)−scfQ(14))

and transformed back into linear domain using

g.sub.SNS(b)=2.sup.scfQint(b) for b=0.63

[0215] Interpolation is used to get a smooth noise shaping curve and thus to avoid any big amplitude jumps between adjacent bands. [0216] Step 10: Spectral Shaping (123)

[0217] The SNS scale factors g.sub.SNS(b) are applied on the MDCT frequency lines for each band separately in order to generate the shaped spectrum X.sub.s(k)

[00008] $X_{s} (k) = \frac{X (k)}{g_{SNS} (b)} for k = Ind (b) .Math. Ind (b + 1) - 1,$ $for b = 0 .Math. 63$

[0218] FIG. 18 illustrates an implementation of an apparatus for decoding an encoded audio signal 250 (a stereo signal encoded as L, R or M, S) comprising information on an encoded spectral representation and information on an encoded representation of a second set of scale parameters (separately of jointly encoded). The decoder comprises an input interface 200, a spectrum decoder 210 (e.g. performing IGF processing or inverse stereo processing or dequantization processing), a scale factor/parameter decoder 220, a spectral processor 230 (e.g. for R, L) and a converter 240 (e.g. for R, L). The input interface 200 is configured for receiving the encoded audio signal 250 and for extracting the encoded spectral representation that is forwarded to the spectrum decoder 210 and for extracting the encoded representation of the second set of scale factors that is forwarded to the scale factor decoder 220. Furthermore, the spectrum decoder 210 is configured for decoding the encoded spectral representation to obtain a decoded spectral representation that is forwarded to the spectral processor 230. The scale factor decoder 220 is configured for decoding the encoded second set of scale parameters to obtain a first set of scale parameters forwarded to the spectral processor 230. The first set of scale factors has a number of scale factors or scale parameters that is greater than the number of scale factors or scale parameters in the second set. The spectral processor 230 is configured for processing the decoded spectral representation using the first set of scale parameters to obtain a scaled spectral representation. The scaled spectral representation is then converted by the converter 240 to finally obtain the decoded audio signal 260 being a stereo signal or a multichannel signal with more than two channels.

[0219] Advantageously, the scale factor decoder 220 is configured to operate in substantially the same manner as has been discussed with respect to the spectral processor 120 of FIG. 11 relating to the calculation of the third set of scale factors or scale parameters as discussed in connection with blocks 141 or 142 and, particularly, with respect to blocks 121, 122 of FIG. 15. Particularly, the scale factor decoder is configured to perform the substantially same procedure for the interpolation and the transformation back into the linear domain as has been discussed before with respect to step 9. Thus, as illustrated in FIG. 19, the scale factor decoder 220 is configured for applying a decoder codebook 221 to the one or more indices per frame representing the encoded scale parameter representation. Then, an interpolation is performed in block 222 that is substantially the same interpolation as has been discussed with respect to block 121 in FIG. 15. Then, a linear domain converter 223 is used that is substantially the same linear domain converter 122 as has been discussed with respect to FIG. 15. However, in other implementations, blocks 221, 222, 223 can operate different from what has been discussed with respect to the corresponding blocks on the encoder-side.

[0220] Furthermore, the spectrum decoder 210 illustrated in FIG. 18 or 19 comprises a dequantizer/decoder block that receives, as an input, the encoded spectrum and that outputs a dequantized spectrum that is advantageously dequantized using the global gain that is additionally transmitted from the encoder side to the decoder side within the encoded audio signal in an encoded form. The block 210 may also perform IGF processing or inverse stereo processing such as MS decoding. The dequantizer/decoder 210 can, for example, comprise an arithmetic or Huffman decoder functionality that receives, as an input, some kind of codes and that outputs quantization indices representing spectral values. Then, these quantization indices are input into a dequantizer together with the global gain and the output are dequantized spectral values that can then be subjected to a TNS processing such as an inverse prediction over frequency in a TNS decoder processing block 211 that, however, is optional. Particularly, the TNS decoder processing block additionally receives the TNS side information that has been generated by block 124 of FIG. 15 as indicated by line 129. The output of the TNS decoder processing step 211 is input into a spectral shaping block 212 operating for each channel separately using the separate scale factors, where the first set of scale factors as calculated by the scale factor decoder are applied to the decoded spectral representation that can or cannot be TNS processed as the case may be, and the output is the scaled spectral representation for each channel that is then input into the converter 240 of FIG. 18.

[0221] Further procedures of embodiments of the decoder are discussed subsequently.

[0222] Decoder: [0223] Step 1: Quantization (221)

[0224] The vector quantizer indices produced in encoder step 8 are read from the bitstream and used to decode the quantized scale factors scfQ(n). [0225] Step 2: Interpolation (222, 223)

[0226] Same as Encoder Step 9. [0227] Step 3: Spectral Shaping (212)

[0228] The SNS scale factors g.sub.SNS(b) are applied on the quantized MDCT frequency lines for each band separately in order to generate the decoded spectrum 9(k) as outlined by the following code.

{circumflex over (X)}(k)= custom-character (k).Math.g.sub.SNS(b) for k=Ind(b) . . . Ind(b+1)−1, for b=0 . . . 63

[0229] FIG. 16 and FIG. 17 illustrate a general encoder/decoder setup where FIG. 16 represents an implementation without TNS processing, while FIG. 17 illustrates an implementation that comprises TNS processing. Similar functionalities illustrated in FIG. 16 and FIG. 17 correspond to similar functionalities in the other figures when identical reference numerals are indicated. Particularly, as illustrated in FIG. 16, the input signal 160 e.g. a stereo signal or a multichannel signal is input into a transform stage 110 and, subsequently, the spectral processing 120 is performed. Particularly, the spectral processing is reflected by an SNS encoder indicated by reference numerals 123, 110, 130, 140 indicating that the block SNS encoder implements the functionalities indicated by these reference numerals. Subsequently to the SNS encoder block, a quantization encoding operation 120b, 125 is performed, and the encoded signal is input into the bitstream as indicated at 180 in FIG. 16. The bitstream 180 then occurs at the decoder-side and subsequent to an inverse quantization and decoding illustrated by reference numeral 210, the SNS decoder operation illustrated by blocks 210, 220, 230 of FIG. 18 are performed so that, in the end, subsequent to an inverse transform 240, the decoded output signal 260 is obtained.

[0230] FIG. 17 illustrates a similar representation as in FIG. 16, but it is indicated that, advantageously, the TNS processing is performed subsequent to SNS processing on the encoder-side and, correspondingly, the TNS processing 211 is performed before the SNS processing 212 with respect to the processing sequence on the decoder-side.

[0231] Advantageously, the additional tool TNS between Spectral Noise Shaping (SNS) and quantization/coding (see block diagram below) is used. TNS (Temporal Noise Shaping) also shapes the quantization noise but does a time-domain shaping (as opposed to the frequency-domain shaping of SNS) as well. TNS is useful for signals containing sharp attacks and for speech signals.

[0232] TNS is usually applied (in AAC for example) between the transform and SNS. Advantageously, however, it is of advantage to apply TNS on the shaped spectrum. This avoids some artifacts that were produced by the TNS decoder when operating the codec at low bitrates.

[0233] FIG. 20 illustrates a subdivision of the spectral coefficients or spectral lines as obtained by block 100 on the encoder-side into bands. Particularly, it is indicated that lower bands have a smaller number of spectral lines than higher bands.

[0234] Particularly, the x-axis in FIG. 20 corresponds to the index of bands and illustrates the embodiment of 64 bands and the y-axis corresponds to the index of the spectral lines illustrating 320 spectral coefficients in one frame. Particularly, FIG. 20 illustrates exemplarily the situation of the super wide band (SWB) case where there is a sampling frequency of 32 kHz.

[0235] For the wide band case, the situation with respect to the individual bands is so that one frame results in 160 spectral lines and the sampling frequency is 16 kHz so that, for both cases, one frame has a length in time of 10 milliseconds.

[0236] FIG. 21 illustrates more details on the downsampling performed in the downsampler 130 of FIG. 11 or the corresponding upsampling or interpolation as performed in the scale factor decoder 220 of FIG. 18 or as illustrated in block 222 of FIG. 19.

[0237] Along the x-axis, the index for the bands 0 to 63 is given. Particularly, there are 64 bands going from 0 to 63.

[0238] The 16 downsample points corresponding to scfQ(i) are illustrated as vertical lines 1100. Particularly, FIG. 21 illustrates how a certain grouping of scale parameters is performed to finally obtain the downsampled point 1100. Exemplarily, the first block of four bands consists of (0, 1, 2, 3) and the middle point of this first block is at 1.5 indicated by item 1100 at the index 1.5 along the x-axis.

[0239] Correspondingly, the second block of four bands is (4, 5, 6, 7), and the middle point of the second block is 5.5.

[0240] The windows 1110 correspond to the windows w(k) discussed with respect to the step 6 downsampling described before. It can be seen that these windows are centered at the downsampled points and there is the overlap of one block to each side as discussed before.

[0241] The interpolation step 222 of FIG. 19 recovers the 64 bands from the 16 downsampled points. This is seen in FIG. 21 by computing the position of any of the lines 1120 as a function of the two downsampled points indicated at 1100 around a certain line 1120. The following example exemplifies that.

[0242] The position of the second band is calculated as a function of the two vertical lines around it (1.5 and 5.5): 2=1.5+1/8x(5.5−1.5).

[0243] Correspondingly, the position of the third band as a function of the two vertical lines 1100 around it (1.5 and 5.5): 3=1.5+3/8x(5.5−1.5).

[0244] A specific procedure is performed for the first two bands and the last two bands. For these bands, an interpolation cannot be performed, because there would not exist vertical lines or values corresponding to vertical lines 1100 outside the range going from 0 to 63. Thus, in order to address this issue, an extrapolation is performed as described with respect to step 9: interpolation as outlined before for the two bands 0, 1 on the one hand and 62 and 63 on the other hand.

[0245] Subsequently, an implementation of the converter 100 of FIG. 11 on the one hand and the converter 240 of FIG. 18 on the other hand are discussed.

[0246] Particularly, FIG. 22a illustrates a schedule for indicating the framing performed on the encoder-side within converter 100. FIG. 22b illustrates an implementation of the converter 100 of FIG. 11 on the encoder-side and FIG. 22c illustrates an implementation of the converter 240 on the decoder-side.

[0247] The converter 100 on the encoder-side may be implemented to perform a framing with overlapping frames such as a 50% overlap so that frame 2 overlaps with frame 1 and frame 3 overlaps with frame 2 and frame 4. However, other overlaps or a non-overlapping processing can be performed as well, but it is of advantage to perform a 50% overlap together with an MDCT algorithm. To this end, the converter 100 comprises an analysis window 101 and a subsequently-connected spectral converter 102 for performing an FFT processing, an MDCT processing or any other kind of time-to-spectrum conversion processing to obtain a sequence of frames corresponding to a sequence of spectral representations as input in FIG. 11 to the blocks subsequent to the converter 100.

[0248] Correspondingly, the scaled spectral representation(s) are input into the converter 240 of FIG. 18. Particularly, the converter comprises a time-converter 241 implementing an inverse FFT operation, an inverse MDCT operation or a corresponding spectrum-to-time conversion operation. The output is inserted into a synthesis window 242 and the output of the synthesis window 242 is input into an overlap-add processor 243 to perform an overlap-add operation in order to finally obtain the decoded audio signal. Particularly, the overlap-add processing in block 243, for example, performs a sample-by-sample addition between corresponding samples of the second half of, for example, frame 3 and the first half of frame 4 so that the audio sampling values for the overlap between frame 3 and frame 4 as indicated by item 1200 in FIG. 22a is obtained. Similar overlap-add operations in a sample-by-sample manner are performed to obtain the remaining audio sampling values of the decoded audio output signal.

[0249] It is to be mentioned here that all alternatives or aspects as discussed before and all aspects as defined by independent claims in the following claims can be used individually, i.e., without any other alternative or object than the contemplated alternative, object or independent claim. However, in other embodiments, two or more of the alternatives or the aspects or the independent claims can be combined with each other and, in other embodiments, all aspects, or alternatives and all independent claims can be combined to each other.

[0250] Although more aspects are described above, the attached claims indicate two different aspects, i.e., an Audio Decoder, an Audio Encoder, and Related Methods Using Joint Coding of Scale Parameters for Channels of a Multi-Channel Audio Signal, or an Audio Quantizer, an Audio Dequantizer, or Related Methods. These two aspects can be combined or used separately, as the case may be, and the inventions in accordance with these aspects are applicable to other application of audio processing different from the above described specific applications.

[0251] Furthermore, reference is made to the additional FIGS. 3a, 3b, 4a, 4b, 5, 6, 8a, 8b illustrating the first aspect and FIGS. 9a, 9b illustrating the second aspect and FIGS. 7a, 7b illustrating the second aspect as applied within the first aspect.

[0252] An inventively encoded signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

[0253] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.

[0254] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.

[0255] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.

[0256] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.

[0257] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.

[0258] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.

[0259] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.

[0260] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.

[0261] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

[0262] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

[0263] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.

[0264] While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention. Subsequently, further embodiments/examples are summarized: [0265] 1. Audio quantizer for quantizing a plurality of audio information items, comprising: [0266] a first stage vector quantizer (141, 143) for quantizing the plurality of audio information items to determine a first stage vector quantization result and a plurality of intermediate quantized items corresponding to the first stage vector quantization result; [0267] a residual item determiner (142) for calculating a plurality of residual items from the plurality of intermediate quantized items and the plurality of audio information items; and [0268] a second stage vector quantizer (145) for quantizing the plurality of residual items to obtain a second stage vector quantization result, wherein the first stage vector quantization result and the second stage vector quantization result are a quantized representation of the plurality of audio information items. [0269] 2. Audio quantizer of example 1, wherein the residual item determiner (142) is configured to calculate, for each residual item, a difference between a corresponding audio information item and a corresponding intermediate quantized item. [0270] 3. Audio quantizer of example 1 or 2, wherein the residual item determiner (142) is configured to amplify or weight, for each residual item, a difference between a corresponding audio information item and a corresponding intermediate quantized item so that the plurality of residual items are greater than the corresponding differences, or to amplify or weight the plurality of audio information items and/or the plurality of intermediate quantized items before calculating a difference between amplified items to obtain the residual items. [0271] 4. Audio quantizer of one of the preceding examples, [0272] wherein the residual item determiner (142) is configured to divide corresponding differences between the plurality of intermediate quantized items and the audio information items by a predetermined factor being lower than 1 or to multiply corresponding differences between the plurality of intermediate quantized items and the audio information items by a predetermined factor being greater than 1. [0273] 5. Audio quantizer of one of the preceding examples, [0274] wherein the first stage vector quantizer (141, 143) is configured to perform the quantization with a first quantization precision, wherein the second stage vector quantizer (145) is configured to perform the quantization with a second quantization precision, and wherein the second quantization precision is lower or higher than the first quantization precision, or [0275] wherein the first stage vector quantizer (141, 143) is configured to perform a fixed rate quantization and wherein the second stage vector quantizer (145) is configured to perform a variable rate quantization. [0276] 6. Audio quantizer of one of the preceding examples, wherein the first stage vector quantizer (141, 143) is configured to use a first stage codebook having a first number of entries, wherein the second stage vector quantizer (145) is configured to use a second stage codebook having a second number of entries, and wherein the second number of entries is lower or higher than the first number of entries. [0277] 7. Audio quantizer of one of the preceding examples, [0278] wherein the audio information items are scale parameters for a frame of an audio signal usable for scaling time domain audio samples of an audio signal in a time domain or usable for scaling spectral domain audio samples of an audio signal in a spectral domain, wherein each scale parameter is usable for scaling at least two time domain or spectral domain audio samples, wherein the frame comprises a first number of scale parameters, [0279] wherein the first stage vector quantizer (141, 143) is configured to perform a split of the first number of scale parameters into two or more sets of scale parameters, and wherein the first stage vector quantizer (141, 143) is configured to determine a quantization index for each set of scale parameters to obtain a plurality of quantization indices representing the first quantization result. [0280] 8. Audio quantizer of example 7, wherein the first stage vector quantizer (141, 143) is configured to combine a first quantization index for the first set and a second quantization index for the second set to obtain a single index as the first quantization result. [0281] 9. Audio quantizer of example 8, [0282] wherein the first stage vector quantizer (141, 143) is configured to multiply one of the first and the second index by a number corresponding to the number of bits of the first and the second index and to add a multiplied index and a non-multiplied index to obtain the single index. [0283] 10. Audio quantizer of one of the preceding examples, [0284] wherein the second stage vector quantizer (145) is an algebraic vector quantizer, wherein each index comprises a base codebook index and a Voronoi extension index. [0285] 11. Audio quantizer of one of the preceding examples, [0286] wherein the first stage vector quantizer (141, 143) is configured to perform a first split of the plurality of audio information items, [0287] wherein the second stage vector quantizer (145) is configured to perform a second split of the plurality of residual items, [0288] wherein the first split results in a first number of subsets of audio information items and the second split results in a second number of subsets of residual items, wherein the first number of subsets is equal to the second number of subsets. [0289] 12. Audio quantizer of one of the preceding examples, [0290] wherein the first vector quantizer is configured to output, from a first codebook search, a first index having a first number of bits, [0291] wherein the second vector quantizer is configured to output, for a second codebook search, a second index having a second number of bits, the second number of bits being lower or higher than the first number of bits. [0292] 13. Audio quantizer of example 12, [0293] wherein the first number of bits is a number of bits between 4 and 7, and wherein the second number of bits is a number of bits between 3 and 6. [0294] 14. Audio quantizer of one of the preceding examples, [0295] wherein the audio information items comprise, for a first frame of a multichannel audio signal, a first plurality of scale parameters for a first channel of the multichannel audio signal, and a second plurality of scale parameters for a second channel of the multichannel audio signal, [0296] wherein the audio quantizer is configured to apply the first and the second stage vector quantizers to the first plurality and the second plurality of the first frame, [0297] wherein the audio information items comprise, for a second frame of the multichannel audio signal, a third plurality of mid scale parameters and a fourth plurality of side scale parameters, and [0298] wherein the audio quantizer is configured to apply the first and the second stage vector quantizers to the third plurality of mid scale parameters, and to apply the second vector quantizer stage to the fourth plurality of side scale parameters and to not apply the first stage vector quantizer (141, 143) to the fourth plurality of side scale parameters. [0299] 15. Audio quantizer of example 14, [0300] wherein the residual item determiner (142) is configured to amplify or weight, for the second frame, the fourth plurality of side scale parameters, and wherein the second stage vector quantizer (145) is configured to process amplified or weighted side scale parameters for the second frame of the multichannel audio signal. [0301] 16. Audio dequantizer for dequantizing a quantized plurality of audio information items, comprising: [0302] a first stage vector dequantizer (2220) for dequantizing a first stage vector quantization result included in the quantized plurality of audio information items to obtain a plurality of intermediate quantized audio information items; [0303] a second stage vector dequantizer (2260) for dequantizing a second stage vector quantization result included in the quantized plurality of audio information items to obtain a plurality of residual items; and [0304] a combiner (2240) for combining the plurality of intermediate quantized information items and the plurality of residual items to obtain a dequantized plurality of audio information items. [0305] 17. Audio dequantizer of example 16, wherein the combiner (2240) is configured to calculate, for each dequantized information item, a sum between a corresponding intermediate quantized audio information item and a corresponding residual item. [0306] 18. Audio dequantizer of one of examples 16 or 17, [0307] wherein the combiner (2240) is configured to attenuate or weight the plurality of residual items, so that attenuated residual items are lower than corresponding residual items before performing the attenuation, and [0308] wherein the combiner (2240) is configured to add the attenuated residual items to the corresponding intermediate quantized audio information items, [0309] or [0310] wherein the combiner (2240) is configured to use an attenuation or weighting value lower than 1 to attenuate the plurality of residual items or jointly encoded scaling parameters before performing a combination, wherein the combination is performed using attenuated residual values, and/or [0311] wherein, exemplarily, the weighting or attenuation value is used to multiply a scaling parameter by the weighting or amplification value, wherein the weighting value is advantageously between 0.1 and 0.9, or more advantageously between 0.2 and 0.6 or even more advantageously between 0.25 and 0.4, and/or [0312] wherein the same attenuation or weighting value is used for all scaling parameters of the plurality of residual items or any jointly encoded scaling parameters. [0313] 19. Audio dequantizer of example 18, wherein the combiner (2240) is configured to multiply a corresponding residual item by a weighting factor being lower than one or to divide a corresponding residual item by a weighting factor being greater than one. [0314] 20. Audio dequantizer of one of examples 16 to 19, [0315] wherein the first stage dequantizer is configured to perform the dequantization with a first precision, [0316] wherein the second stage dequantizer is configured to perform the dequantization with a second precision, wherein the second precision is lower or higher than the first precision. [0317] 21. Audio dequantizer of one of examples 16 to 20, [0318] wherein the first stage dequantizer is configured to use a first stage codebook having a first number of entries, wherein the second stage dequantizer is configured to use a second stage codebook having a second number of entries, and wherein the second number of entries is lower than or higher than the first number of entries, or [0319] wherein the first stage dequantizer is configured to receive, for a first codebook retrieval, a first index having a first number of bits, [0320] wherein the second stage vector dequantizer (2260) is configured to receive, for a second codebook retrieval, a second index having a second number of bits, the second number of bits being lower or higher than the first number of bits, or wherein, exemplarily, the first number of bits is a number of bits between 4 and 7, and wherein, exemplarily, the second number of bits is a number of bits between 3 and 6. [0321] 22. Audio dequantizer of one of examples 16 to 21, [0322] wherein the dequantized plurality of audio information items are scale parameters for a frame of an audio signal usable for scaling time domain audio samples of an audio signal in a time domain or usable for scaling spectral domain audio samples of an audio signal in a spectral domain, wherein each scale parameter is usable for scaling at least two time domain or spectral domain audio samples, wherein the frame comprises a first number of scale parameters, [0323] wherein the first stage dequantizer is configured to determine, from two or more result indices for the first stage vector quantization result, a first set and a second set of scale parameters, and [0324] wherein the first stage vector dequantizer (2220) or the combiner (2240) is configured to put together the first set of scale parameters and the second set of scale parameters into a vector to obtain the first number of intermediate quantized scale parameters. [0325] 23. Audio dequantizer of example 22, [0326] wherein the first stage vector dequantizer (2220) is configured to retrieve, as the first stage dequantization result, a single combined index and to process the single combined index to obtain the two or more result indices. [0327] 24. Audio dequantizer of example 23, [0328] wherein the first stage dequantizer is configured to retrieve the first result index by determining a remainder from a division and to retrieve the second result index by determining an integer result from the division. [0329] 25. Audio dequantizer of one of examples 16 to 24, wherein the second stage vector dequantizer (2260) is an algebraic vector dequantizer, wherein each index comprises a base codebook index and a Voronoi extension index. [0330] 26. Audio dequantizer of one of examples 16 to 25, [0331] wherein the first stage vector dequantizer (2220) or the combiner (2240) is configured to put together a first set of scale parameters and a second set of scale parameters from a quantization split in a frame of an audio signal, [0332] wherein the second stage vector dequantizer (2260) is configured to put together a first set of residual parameters and a second set of residual parameters from a split of residual parameters, and [0333] wherein a number of splits addressed by the first vector dequantizer and another number of splits addressed by the second stage vector dequantizer (2260) are the same. [0334] 27. Audio dequantizer of one of examples 16 to 26, [0335] wherein the first stage vector dequantizer (2220) is configured to use a first index having a first number of bits to generate the plurality of intermediate quantized audio information items, and [0336] wherein the second stage vector dequantizer (2260) is configured to use, as an index, a second index having a second number of bits to obtain the plurality of residual items, wherein the second number of bits is lower than or higher than the first number of bits. [0337] 28. Audio dequantizer of example 27, wherein the first number of bits is between four and seven, and the second number of bits is between three and six. [0338] 29. Audio dequantizer of one of the examples 16 to 28, [0339] wherein the quantized plurality of audio information items comprise, for a first frame of a multi-channel audio signal, a first plurality of scale parameters for a first channel of the multi-channel audio signal and a second plurality of scale parameters for a second channel of the multi-channel audio signal, [0340] wherein the audio dequantizer is configured to apply the first stage vector dequantizer (2220) and the second stage vector dequantizer (2260) to the first plurality and the second plurality of the first frame, [0341] wherein the quantized plurality of audio information items comprises, for a second frame of the mufti-channel audio signal, a third plurality of mid scale parameters and a fourth plurality of side scale parameters, and [0342] wherein the audio dequantizer is configured to apply the first stage vector dequantizer (2220) and the second stage vector dequantizer (2260) to the third plurality of mid scale parameters and to apply the second stage vector dequantizer (2260) to the fourth plurality of side scale parameters and to not apply the first stage vector dequantizer (2220) to the fourth plurality of side scale parameters. [0343] 30. Audio dequantizer of example 29, [0344] wherein the combiner (2240) is configured to attenuate, for the second frame, the fourth plurality of side scale parameters before further using or further processing the fourth plurality of side scale parameters. [0345] 31. A method of quantizing a plurality of audio information items, comprising: [0346] first stage vector quantizing the plurality of audio information items to determine a first stage vector quantization result and a plurality of intermediate quantized items corresponding to the first stage vector quantization result; [0347] calculating a plurality of residual items from the plurality of intermediate quantized items and the plurality of audio information items; and [0348] second stage vector quantizing the plurality of residual items to obtain a second stage vector quantization result, wherein the first stage vector quantization result and the second stage vector quantization result are a quantized representation of the plurality of audio information items. [0349] 32. A method of dequantizing a quantized plurality of audio information items, comprising: [0350] first stage vector dequantizing a first stage vector quantization result included in the quantized plurality of audio information items to obtain a plurality of intermediate quantized audio information items; [0351] second stage vector dequantizing a second stage vector quantization result included in the quantized plurality of audio information items to obtain a plurality of residual items; and [0352] combining the plurality of intermediate quantized information items and the plurality of residual items to obtain a dequantized plurality of audio information items. [0353] 33. Computer program for performing, when running on a computer or a processor, the method of example 31 or the method of example 32.

REFERENCES

[0354] [1] ISO/IEC 11172-3, Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—Part 3: Audio, 1993. [0355] [2] ISO/IEC 13818-7, Information technology—Generic coding of moving pictures and associated audio information—Part 7: Advanced Audio Coding (AAC), 2003. [0356] [3] ISO/IEC 23003-3; Information technology—MPEG audio technologies—Part 3: Unifled speech and audio coding. [0357] [4] 3GPP TS 26.445, Codec for Enhanced Voice Services (EVS); Detailed algorithmic description. [0358] [5] G. Markovic, G. Fuchs, N. Rettelbach, C. Helmrich und B. Schubert, “LINEAR PREDICTION BASED CODING SCHEME USING SPECTRAL DOMAIN NOISE SHAPNG”. U.S. Pat. No. 9,595,262 B2, 14 Mar. 2017. [0359] [6] E. Ravelli, M. Schnell, C. Benndorf, M. Lutzky und M. Dietz, “Apparatus and method for encoding and decoding an audio signal using downsampling or interpolation of scale parameters”. WO Publication WO 2019091904 A1, May 11, 2018. [0360] [7] A. Biswas, Advances in Perceptual Stereo Audio Coding Using LinearPrediction Techniques, Eindhoven: Technical University of Eindhoven, 2017. [0361] [8] G. Markovic, E. Ravelli, M. Schnell, S. Dohla, W. Jaegars, M. Dietz, C. Heimrich, E. Fotopoulou, M. Multrus, S. Bayer, G. Fuchs und J. Herre, “APPARATUS AND METHOD FOR MDCT M/S STEREO WITH GLOBAL ILD WITH IMPROVED MID/SIDE DECISION”. WO Publication WO2017EP5117.

AUDIO DECODER, AUDIO ENCODER, AND RELATED METHODS USING JOINT CODING OF SCALE PARAMETERS FOR CHANNELS OF A MULTI-CHANNEL AUDIO SIGNAL

Inventors

Cpc classification

Classification Explorer

G10L19/18

PHYSICS

Classification Explorer

G10L2019/0005

PHYSICS

Classification Explorer

G10L19/038

PHYSICS

Classification Explorer

G10L19/02

PHYSICS

Classification Explorer

G10L19/035

PHYSICS

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

G10L19/22

PHYSICS

International classification

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

G10L19/035

PHYSICS

Classification Explorer

G10L19/22

PHYSICS

Abstract

Claims

Description