Compatible multi-channel coding/decoding

Abstract

In processing a multi-channel audio signal having at least three original channels, first and second downmix channels derived from the original channels are provided. For a selected original channel of the original channels, channel side information are calculated such that a downmix channel or a combined downmix channel including the first and second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel. The channel side information and the first and second downmix channels form output data to be transmitted to a low-level decoder, which only decodes the first and second downmix channels, or to a high-level decoder, which provides a full multi-channel audio signal based on the downmix channels and the channel side information. Since the channel side information occupy few bits only and since the decoder does not use dematrixing, an efficient and high quality multi-channel extension for stereo players and enhanced multi-channel players is acquired.

Claims

1. An apparatus for processing a multi-channel audio signal, the multi-channel audio signal comprising at least three original audio channels, comprising: a provider for providing a first downmix channel and a second downmix channel, the first and the second downmix channels being derived from the at least three original audio channels; a calculator for calculating channel side information for a selected original channel of the at least three original audio channels, the calculator being operative to calculate the channel side information such that a downmix channel or a combined downmix channel comprising the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel; and a generator for generating output data, the output data comprising the channel side information; the multi-channel audio signal including a left channel, a left surround channel, a right channel and a right surround channel; said provider being operative to provide the first downmix channel as a left downmix channel and to provide the second downmix channel as a right downmix channel, the left and the right downmix channels being formed such that a result, when played, is a stereo representation of the multi-channel audio signal, and said calculator being operative to calculate the channel side information for the left channel as the selected original channel using the left downmix channel, to calculate the channel side information for the right channel as the selected original channel using the right downmix channel, to calculate the channel side information for the left surround channel as the selected original channel using the left downmix channel, and to calculate the channel side information for the right surround channel as the selected original channel using the right downmix channel; wherein the output data are formed as an output bitstream, and wherein the apparatus is configured for transmitting the output bitstream to a bitstream decoder.

2. The apparatus in accordance with claim 1, wherein the generator is operative to generate the output data such that the output data additionally comprise the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel.

3. The apparatus in accordance with claim 1, wherein the calculator is operative to determine the channel side information as parametric data not comprising time domain samples or spectral values.

4. The apparatus in accordance with claim 1, wherein the calculator is operative to perform joint stereo coding using the first downmix channel or the second downmix channel as a carrier channel and using, as an input channel, the selected original channel, to generate joint stereo parameters as channel side information for the selected original channel.

5. The apparatus in accordance with claim 3, in which the calculator is operative to perform intensity stereo coding or binaural cue coding, such that the channel side information represent an energy distribution or binaural cue parameters for the selected original channel, wherein the first downmix channel or the second downmix channel or a combined downmix channel is usable as a carrier channel.

6. An apparatus for processing a multi-channel audio signal, the multi-channel audio signal comprising at least three original audio channels, comprising: a provider for providing a first downmix channel and a second downmix channel, the first and the second downmix channels being derived from the at least three original audio channels; a calculator for calculating channel side information for a selected original channel of the at least three original audio channels, the calculator being operative to calculate the channel side information such that a downmix channel or a combined downmix channel comprising the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel; and a generator for generating output data, the output data comprising the channel side information; the at least three original audio channels including a center channel; a combiner for combining the first downmix channel and the second downmix channel to acquire the combined downmix channel; said calculator being configured for calculating the channel side information for an original center channel as the selected original channel such that the combined downmix channel when weighted using the channel side information results in an approximation of the original center channel; and wherein the output data are formed as an output bitstream, and wherein the apparatus is configured for transmitting the output bitstream to a bitstream decoder.

7. The apparatus in accordance with claim 1, wherein the provider is operative to receive the first and the second downmix channels as externally supplied downmix channels.

8. The apparatus in accordance with claim 6, wherein the provider is operative to derive the first downmix channel and the second downmix channel from the original channels using a first predetermined linear weighted combination for the first downmix channel and using a second predetermined linear weighted combination for the second downmix channel.

9. The apparatus in accordance with claim 8, wherein the first predetermined linear weighted combination is defined as follows:
Lc=t(L+a.Math.Ls+b+C); or and wherein the predetermined second linear weighted combination is defined as follows:
Rc=t(R+a.Math.Rs+b.Math.C), wherein Lc is the first downmix channel, wherein Rc is the second downmix channel, wherein t, a and b are weighting factors smaller than 1, wherein L is an original left channel, wherein C is an original center channel, wherein R is an original right channel, wherein Ls is an original left surround channel, and wherein Rs is an original right surround channel.

10. The apparatus in accordance with claim 1, wherein the first downmix channel and the second downmix channel are composite channels being composed of at least two of the at least three original audio channels in varying degrees, wherein the calculator is operative, to use, for calculating the channel side information, the downmix channel of the first and the second downmix channels, which is stronger influenced by the selected original channel when compared to the other downmix channel of the first and the second downmix channels.

11. The apparatus in accordance with claim 1, wherein the generator is operative to form the output data such that the output data are in compliance with an output data syntax to be used by a low level decoder for processing the first downmix channel or a signal derived from the first downmix channel or the second downmix channel or a signal derived from the second downmix channel to acquire a decoded stereo representation of the multi-channel audio signal.

12. The apparatus in accordance with claim 11, wherein the output data syntax is structured such that same comprises a special data field to be ignored by the low level decoder, and in which the generator is operative to insert the channel side information into the special data field.

13. The apparatus in accordance with claim 12, wherein the output data syntax is an mp3 syntax and the special data field is an ancillary data field.

14. The apparatus in accordance with claim 11, wherein the generator is operative to insert the channel side information into the output data such that the channel side information are only used by a high level decoder but are ignored by the low level decoder.

15. The apparatus in accordance with claim 2, which further comprises an encoder for encoding the first downmix channel to acquire the signal derived from the first downmix channel or for encoding the second downmix channel to acquire the signal derived from the second downmix channel.

16. The apparatus in accordance with claim 15, wherein the encoder is a perceptual encoder which comprises a converter for converting a signal to be encoded into a spectral representation, a quantizer for quantizing the spectral representation using a psychoacoustic model, and an entropy encoder for entropy encoding a quantized spectral representation to acquire an entropy encoded quantized spectral representation as the signal derived from the first downmix channel or the signal derived from the second downmix channel.

17. The apparatus in accordance with claim 16, wherein the perceptual encoder is an encoder in accordance with MPEG-1/2 layer III (mp3) or MPEG-2/4 advanced audio coding (AAC).

18. The apparatus in accordance with claim 1, wherein the calculator is operative: to calculate a downmix energy value for the first downmix channel or the second downmix channel or the combined downmix channel, to calculate an original energy value for the selected original channel, and to calculate a gain factor as the channel side information, the gain factor being derived from the downmix energy value and the original energy value.

19. The apparatus in accordance with claim 1, wherein the calculator is operative to calculate frequency dependent channel side information parameters such that for a plurality of frequency bands, a plurality of different channel side information parameters are acquired.

20. A method of processing a multi-channel audio signal, the multi-channel audio signal comprising at least three original audio channels, comprising: providing a first downmix channel and a second downmix channel, the first and the second downmix channels being derived from the at least three original audio channels, the at least three original audio channels including a center channel; calculating channel side information for a selected original channel of the at least three original audio channels such that a downmix channel or a combined downmix channel comprising the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel; and generating output data, the output data comprising the channel side information; combining the first downmix channel and the second downmix channel to acquire the combined downmix channel; wherein the step of calculating the channel side information is performed for an original center channel as the selected original channel such that the combined downmix channel when weighted using the channel side information results in an approximation of the original center channel; and wherein the output data are formed as an output bitstream, and wherein the method is operative for transmitting the output bitstream to a bitstream decoder.

21. An apparatus for inverse processing of input data, the input data comprising channel side information, a first downmix channel or a signal derived from the first downmix channel, and a second downmix channel or a signal derived from the second downmix channel, wherein the first downmix channel and the second downmix channel are derived from at least three original audio channels of a multi-channel audio signal, and wherein the channel side information are calculated such that a downmix channel or a combined downmix channel comprising the first downmix channel and the second downmix channel, when weighted using the channel side information, results in an approximation of a selected original channel, the apparatus comprising: an input data reader for reading the input data to acquire the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel and the channel side information; a channel reconstructor for reconstructing the approximation of the selected original channel using the channel side information and the first downmix channel or the second downmix channel or the combined downmix channel to acquire the approximation of the selected original channel; said channel reconstructor being operative to reconstruct an approximation for a center channel using the channel side information for the center channel and the combined downmix channel: and wherein the apparatus is configured for playing back the approximation for the center channel.

22. The apparatus in accordance with claim 21, further comprising a perceptual decoder for decoding the signal derived from the first downmix channel to acquire the decoded version of the first downmix channel and for decoding the signal derived from the second downmix channel to acquire a decoded version of the second downmix channel.

23. The apparatus in accordance with claim 21, further comprising a combiner for combining the first downmix channel and the second downmix channel to acquire the combined downmix channel.

24. An apparatus for inverse processing of input data, the input data comprising channel side information, a first downmix channel or a signal derived from the first downmix channel and a second downmix channel or a signal derived from the second downmix channel, wherein the first downmix channel and the second downmix channel are derived from at least three original audio channels of a multi-channel audio signal, and wherein the channel side information are calculated such that a downmix channel or a combined downmix channel comprising the first downmix channel and the second downmix channel, when weighted using the channel side information, results in an approximation of a selected original channel, the apparatus comprising: an input data reader for reading the input data to acquire the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel and the channel side information; a channel reconstructor for reconstructing the approximation of the selected original channel using the channel side information and the first or the second downmix channel or the combined downmix channel to acquire the approximation of the selected original channel; wherein the at least three original audio channels includes a left channel, a left surround channel, a right channel, a right surround channel and a center channel; wherein the first downmix channel and the second downmix channel are a left downmix channel and a right downmix channel, respectively; and wherein the input data comprise channel side information for at least three of the left channel, the left surround channel, the right channel, the right surround channel and the center channel; wherein the channel reconstructor is operative to reconstruct an approximation for the left channel using channel side information for the left channel and the left downmix channel, to reconstruct an approximation for the left surround channel using channel side information for the left surround channel and the left downmix channel, to reconstruct an approximation for the right channel using channel side information for the right channel and the right downmix channel, and to reconstruct an approximation for the right surround channel using channel side information for the right surround channel and the right downmix channel: and wherein the apparatus is configured for playing back the approximation for the left channel, the approximation for the left surround channel, the approximation for the right channel and the approximation for the right surround channel.

25. A method of inverse processing of input data, the input data comprising channel side information, a first downmix channel or a signal derived from the first downmix channel and a second downmix channel or a signal derived from the second downmix channel, wherein the first downmix channel and the second downmix channel are derived from at least three original audio channels of a multi-channel audio signal, and wherein the channel side information are calculated such that a downmix channel or a combined downmix channel comprising the first downmix channel and the second downmix channel, when weighted using the channel side information, results in an approximation of a selected original channel, the method comprising: reading the input data to acquire the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel and the channel side information; and reconstructing the approximation of the selected original channel using the channel side information and the first downmix channel or the second downmix channel or the combined downmix channel to acquire the approximation of the selected original channel; wherein the reconstructing step comprises reconstructing an approximation for a center channel using channel side information for the center channel and the combined downmix channel: and wherein the method is operative for playing back the approximation for the center channel.

26. A non-transitory digital storage medium having a computer program stored thereon to perform the method of processing a multi-channel audio signal, the multi-channel audio signal having at least three original audio channels, which method comprises: providing a first downmix channel and a second downmix channel, the first and the second downmix channels being derived from the at least three original audio channels, the at least three original audio channels including a center channel; calculating channel side information for a selected original channel of the at least three original audio channels such that a downmix channel or a combined downmix channel comprising the first and the second downmix channels, when weighted using the channel side information, results in an approximation of the selected original channel; generating output data, the output data comprising the channel side information; combining the first downmix channel and the second downmix channel to acquire the combined downmix channel; and wherein the step of calculating the channel side information is performed for an original center channel as the selected original channel such that the combined downmix channel when weighted using the channel side information results in an approximation of the original center channel; and wherein the output data are formed as an output bitstream, and wherein the method is operative for transmitting the output bitstream to a bitsream decoder; when said computer program is run by a computer.

27. A non-transitory digital storage medium having a computer program stored thereon to perform the method for inverse processing of input data, the input data comprising channel side information, a first downmix channel or a signal derived from the first downmix channel and a second downmix channel or a signal derived from the second downmix channel, wherein the first downmix channel and the second downmix channel are derived from at least three original audio channels of a multi-channel audio signal, and wherein the channel side information are calculated such that a downmix channel or a combined downmix channel comprising the first downmix channel and the second downmix channel, when weighted using the channel side information, results in an approximation of a selected original channel, which method comprises: reading the input data to acquire the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel and the channel side information; and reconstructing the approximation of the selected original channel using the channel side information and the first downmix channel or the second downmix channel or the combined downmix channel to acquire the approximation of the selected original channel; wherein the step of reconstructing includes reconstructing an approximation for a center channel using channel side information for the center channel and the combined downmix channel; and wherein the method is operative for playing back the approximation for the center channel; when said computer program is run by a computer.

28. A method of inverse processing of input data, the input data comprising channel side information, a first downmix channel or a signal derived from the first downmix channel and a second downmix channel or a signal derived from the second downmix channel, wherein the first downmix channel and the second downmix channel are derived from at least three original audio channels of a multi-channel audio signal, and wherein the channel side information are calculated such that a downmix channel or a combined downmix channel comprising the first downmix channel and the second downmix channel, when weighted using the channel side information, results in an approximation of a selected original channel, the method comprising: reading the input data to acquire the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel and the channel side information; reconstructing the approximation of the selected original channel using the channel side information and the first or the second downmix channel or the combined downmix channel to acquire the approximation of the selected original channel, wherein the at least three original audio channels comprise a left channel, a left surround channel, a right channel, a right surround channel, and center channel, wherein the first downmix channel and the second downmix channel are a left downmix channel and a right downmix channel, respectively, and wherein the input data comprise channel side information for at least three of the left channel, the left surround channel, the right channel, the right surround channel, and the center channel; and the step of reconstructing including: reconstructing an approximation for the left channel using channel side information for the left channel and the left downmix channel; reconstructing an approximation for the left surround channel using channel side information for the left surround channel and the left downmix channel; reconstructing an approximation for the right channel using channel side information for the right channel and the right downmix channel; and reconstructing an approximation for the right surround channel using channel side information for the right surround channel and the right downmix channel; and wherein the method is operative for playing back the approximation for the left channel, the approximation for the left surround channel, the approximation for the right channel and the approximation for the right surround channel.

29. A non-transitory digital storage medium having a computer program stored thereon to perform, when said computer program is run by a computer, the method for inverse processing of input data, the input data comprising channel side information, a first downmix channel or a signal derived from the first downmix channel and a second downmix channel or a signal derived from the second downmix channel, wherein the first downmix channel and the second downmix channel are derived from at least three original audio channels of a multi-channel audio signal, and wherein the channel side information are calculated such that a downmix channel or a combined downmix channel comprising the first downmix channel and the second downmix channel, when weighted using the channel side information, results in an approximation of a selected original channel, which method comprises: reading the input data to acquire the first downmix channel or a signal derived from the first downmix channel and the second downmix channel or a signal derived from the second downmix channel and the channel side information; reconstructing the approximation of the selected original channel using the channel side information and the first or the second downmix channel or the combined downmix channel to acquire the approximation of the selected original channel; wherein the at least three original audio channels comprise a left channel, a left surround channel, a right channel, a right surround channel, and center channel, wherein the first downmix channel and the second downmix channel are a left downmix channel and a right downmix channel, respectively, and wherein the input data comprise channel side information for at least three of the left channel, the left surround channel, the right channel, the right surround channel, and the center channel; and wherein the step of reconstructing comprises: reconstructing an approximation for the left channel using channel side information for the left channel and the left downmix channel; reconstructing an approximation for the left surround channel using channel side information for the left surround channel and the left downmix channel; reconstructing an approximation for the right channel using channel side information for the right channel and the right downmix channel; and reconstructing an approximation for the right surround channel using channel side information for the right surround channel and the right downmix channel; and wherein the method is operative for playing back the approximation for the left channel, the approximation for the left surround channel, the approximation for the right channel, and the approximation for the right surround channel.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:

(2) FIG. 1 is a block diagram of an advantageous embodiment of the inventive encoder;

(3) FIG. 2 is a block diagram of an advantageous embodiment of the inventive decoder;

(4) FIG. 3A is a block diagram for an advantageous implementation of the means for calculating to obtain frequency selective channel side information;

(5) FIG. 3B is an advantageous embodiment of a calculator implementing joint stereo processing such as intensity coding or binaural cue coding;

(6) FIG. 4 illustrates another advantageous embodiment of the means for calculating channel side information, in which the channel side information are gain factors;

(7) FIG. 5 illustrates an advantageous embodiment of an implementation of the decoder, when the encoder is implemented as in FIG. 4;

(8) FIG. 6 illustrates an advantageous implementation of the means for providing the downmix channels;

(9) FIG. 7 illustrates groupings of original and downmix channels for calculating the channel side information for the respective original channels;

(10) FIG. 8 illustrates another advantageous embodiment of an inventive encoder;

(11) FIG. 9 illustrates another implementation of an inventive decoder; and

(12) FIG. 10 illustrates a joint stereo encoder of conventional technology.

DETAILED DESCRIPTION OF THE INVENTION

(13) FIG. 1 shows an apparatus for processing a multi-channel audio signal 10 having at least three original channels such as R, L and C. Advantageously, the original audio signal has more than three channels, such as five channels in the surround environment, which is illustrated in FIG. 1. The five channels are the left channel L, the right channel R, the center channel C, the left surround channel Ls and the right surround channel Rs. The inventive apparatus includes means 12 for providing a first downmix channel Lc and a second downmix channel Rc, the first and the second downmix channels being derived from the original channels. For deriving the downmix channels from the original channels, there exist several possibilities. One possibility is to derive the downmix channels Lc and Rc by means of matrixing the original channels using a matrixing operation as illustrated in FIG. 6. This matrixing operation is performed in the time domain.

(14) The matrixing parameters a, b and t are selected such that they are lower than or equal to 1. Advantageously, a and b are 0.7 or 0.5. The overall weighting parameter t is advantageously chosen such that channel clipping is avoided. Alternatively, as it is indicated in FIG. 1, the downmix channels Lc and Rc can also be externally supplied. This may be done, when the downmix channels Lc and Rc are the result of a “hand mixing” operation. In this scenario, a sound engineer mixes the downmix channels by himself rather than by using an automated matrixing operation. The sound engineer performs creative mixing to get optimized downmix channels Lc and Rc which give the best possible stereo representation of the original multi-channel audio signal.

(15) In case of an external supply of the downmix channels, the means for providing does not perform a matrixing operation but simply forwards the externally supplied downmix channels to a subsequent calculating means 14.

(16) The calculating means 14 is operative to calculate the channel side information such as I.sub.i, Is.sub.i, r.sub.i or rs.sub.i for selected original channels such as L, Ls, R or Rs, respectively. In particular, the means 14 for calculating is operative to calculate the channel side information such that a downmix channel, when weighted using the channel side information, results in an approximation of the selected original channel.

(17) Alternatively or additionally, the means for calculating channel side information is further operative to calculate the channel side information for a selected original channel such that a combined downmix channel including a combination of the first and second downmix channels, when weighted using the calculated channel side information results in an approximation of the selected original channel. To show this feature in the figure, an adder 14a and a combined channel side information calculator 14b are shown.

(18) It is clear for those skilled in the art that these elements do not have to be implemented as distinct elements. Instead, the whole functionality of the blocks 14, 14a, and 14b can be implemented by means of a certain processor which may be a general purpose processor or any other means for performing the functionality that may be used.

(19) Additionally, it is to be noted here that channel signals being subband samples or frequency domain values are indicated in capital letters. Channel side information are, in contrast to the channels themselves, indicated by small letters. The channel side information c.sub.i is, therefore, the channel side information for the original center channel C.

(20) The channel side information as well as the downmix channels Lc and Rc or an encoded version Lc′ and Rc′ as produced by an audio encoder 16 are input into an output data formatter 18. Generally, the output data formatter 18 acts as means for generating output data, the output data including the channel side information for at least one original channel, the first downmix channel or a signal derived from the first downmix channel (such as an encoded version thereof) and the second downmix channel or a signal derived from the second downmix channel (such as an encoded version thereof).

(21) The output data or output bitstream 20 can then be transmitted to a bitstream decoder or can be stored or distributed. Advantageously, the output bitstream 20 is a compatible bitstream which can also be read by a lower scale decoder not having a multi-channel extension capability. Such lower scale encoders such as most existing normal state of the art mp3 decoders will simply ignore the multichannel extension data, i.e., the channel side information. They will only decode the first and second downmix channels to produce a stereo output. Higher scale decoders, such as multi-channel enabled decoders will read the channel side information and will then generate an approximation of the original audio channels such that a multi-channel audio impression is obtained.

(22) FIG. 8 shows an advantageous embodiment of the present invention in the environment of five channel surround/mp3. Here, it is advantageous to write the surround enhancement data into the ancillary data field in the standardized mp3 bit stream syntax such that an “mp3 surround” bit stream is obtained.

(23) FIG. 2 shows an illustration of an inventive decoder acting as an apparatus for inverse processing input data received at an input data port 22. The data received at the input data port 22 is the same data as output at the output data port 20 in FIG. 1. Alternatively, when the data are not transmitted via a wired channel but via a wireless channel, the data received at data input port 22 are data derived from the original data produced by the encoder.

(24) The decoder input data are input into a data stream reader 24 for reading the input data to finally obtain the channel side information 26 and the left downmix channel 28 and the right downmix channel 30. In case the input data includes encoded versions of the downmix channels, which corresponds to the case, in which the audio encoder 16 in FIG. 1 is present, the data stream reader 24 also includes an audio decoder, which is adapted to the audio encoder used for encoding the downmix channels. In this case, the audio decoder, which is part of the data stream reader 24, is operative to generate the first downmix channel Lc and the second downmix channel Rc, or, stated more exactly, a decoded version of those channels. For ease of description, a distinction between signals and decoded versions thereof is only made where explicitly stated.

(25) The channel side information 26 and the left and right downmix channels 28 and 30 output by the data stream reader 24 are fed into a multi-channel reconstructor 32 for providing a reconstructed version 34 of the original audio signals, which can be played by means of a multi-channel player 36. In case the multi-channel reconstructor is operative in the frequency domain, the multi-channel player 36 will receive frequency domain input data, which have to be in a certain way decoded such as converted into the time domain before playing them. To this end, the multi-channel player 36 may also include decoding facilities.

(26) It is to be noted here that a lower scale decoder will only have the data stream reader 24, which only outputs the left and right downmix channels 28 and 30 to a stereo output 38. An enhanced inventive decoder will, however, extract the channel side information 26 and use these side information and the downmix channels 28 and 30 for reconstructing reconstructed versions 34 of the original channels using the multi-channel reconstructor 32.

(27) FIG. 3A shows an embodiment of the inventive calculator 14 for calculating the channel side information, which an audio encoder on the one hand and the channel side information calculator on the other hand operate on the same spectral representation of multi-channel signal. FIG. 1, however, shows the other alternative, in which the audio encoder on the one hand and the channel side information calculator on the other hand operate on different spectral representations of the multi-channel signal. When computing resources are not as important as audio quality, the FIG. 1 alternative is advantageous, since filterbanks individually optimized for audio encoding and side information calculation can be used. When, however, computing resources are an issue, the FIG. 3A alternative is advantageous, since this alternative involves less computing power because of a shared utilization of elements.

(28) The device shown in FIG. 3A is operative for receiving two channels A, B. The device shown in FIG. 3A is operative to calculate a side information for channel B such that using this channel side information for the selected original channel B, a reconstructed version of channel B can be calculated from the channel signal A. Additionally, the device shown in FIG. 3A is operative to form frequency domain channel side information, such as parameters for weighting (by multiplying or time processing as in BCC coding e. g.) spectral values or subband samples. To this end, the inventive calculator includes windowing and time/frequency conversion means 140a to obtain a frequency representation of channel A at an output 140b or a frequency domain representation of channel B at an output 140c.

(29) In the advantageous embodiment, the side information determination (by means of the side information determination means 140f) is performed using quantized spectral values. Then, a quantizer 140d is also present which advantageously is controlled using a psychoacoustic model having a psychoacoustic model control input 140e. Nevertheless, a quantizer is not required, when the side information determination means 140c uses a non-quantized representation of the channel A for determining the channel side information for channel B.

(30) In case the channel side information for channel B are calculated by means of a frequency domain representation of the channel A and the frequency domain representation of the channel B, the windowing and time/frequency conversion means 140a can be the same as used in a filterbank-based audio encoder. In this case, when AAC (ISO/IEC 13818-3) is considered, means 140a is implemented as an MDCT filter bank (MDCT=modified discrete cosine transform) with 50% overlap-and-add functionality.

(31) In such a case, the quantizer 140d is an iterative quantizer such as used when mp3 or AAC encoded audio signals are generated. The frequency domain representation of channel A, which is advantageously already quantized can then be directly used for entropy encoding using an entropy encoder 140g, which may be a Huffman based encoder or an entropy encoder implementing arithmetic encoding.

(32) When compared to FIG. 1, the output of the device in FIG. 3A is the side information such as I.sub.i for one original channel (corresponding to the side information for B at the output of device 140f). The entropy encoded bitstream for channel A corresponds to e. g. the encoded left downmix channel Lc′ at the output of block 16 in FIG. 1. From FIG. 3A it becomes clear that element 14 (FIG. 1), i.e., the calculator for calculating the channel side information and the audio encoder 16 (FIG. 1) can be implemented as separate means or can be implemented as a shared version such that both devices share several elements such as the MDCT filter bank 140a, the quantizer 140e and the entropy encoder 140g. Naturally, in case one needs a different transform etc. for determining the channel side information, then the encoder 16 and the calculator 14 (FIG. 1) will be implemented in different devices such that both elements do not share the filter bank etc.

(33) Generally, the actual determinator for calculating the side information (or generally stated the calculator 14) may be implemented as a joint stereo module as shown in FIG. 3B, which operates in accordance with any of the joint stereo techniques such as intensity stereo coding or binaural cue coding.

(34) In contrast to such of conventional-technology intensity stereo encoders, the inventive determination means 140f does not have to calculate the combined channel. The “combined channel” or carrier channel, as one can say, already exists and is the left compatible downmix channel Lc or the right compatible downmix channel Rc or a combined version of these downmix channels such as Lc+Rc. Therefore, the inventive device 140f only has to calculate the scaling information for scaling the respective downmix channel such that the energy/time envelope of the respective selected original channel is obtained, when the downmix channel is weighted using the scaling information or, as one can say, the intensity directional information.

(35) Therefore, the joint stereo module 140f in FIG. 3B is illustrated such that it receives, as an input, the “combined” channel A, which is the first or second downmix channel or a combination of the downmix channels, and the original selected channel. This module, naturally, outputs the “combined” channel A and the joint stereo parameters as channel side information such that, using the combined channel A and the joint stereo parameters, an approximation of the original selected channel B can be calculated.

(36) Alternatively, the joint stereo module 140f can be implemented for performing binaural cue coding.

(37) In the case of BCC, the joint stereo module 140f is operative to output the channel side information such that the channel side information are quantized and encoded ICLD or ICTD parameters, wherein the selected original channel serves as the actual to be processed channel, while the respective downmix channel used for calculating the side information, such as the first, the second or a combination of the first and second downmix channels is used as the reference channel in the sense of the BCC coding/decoding technique.

(38) Referring to FIG. 4, a simple energy-directed implementation of element 140f is given. This device includes a frequency band selector 44 selecting a frequency band from channel A and a corresponding frequency band of channel B. Then, in both frequency bands, an energy is calculated by means of an energy calculator 42 for each branch. The detailed implementation of the energy calculator 42 will depend on whether the output signal from block 40 is a subband signal or are frequency coefficients. In other implementations, where scale factors for scale factor bands are calculated, one can already use scale factors of the first and second channel A, B as energy values E.sub.A and E.sub.B or at least as estimates of the energy. In a gain factor calculating device 44, a gain factor g.sub.B for the selected frequency band is determined based on a certain rule such as the gain determining rule illustrated in block 44 in FIG. 4. Here, the gain factor g.sub.B can directly be used for weighting time domain samples or frequency coefficients such as will be described later in FIG. 5. To this end, the gain factor g.sub.B, which is valid for the selected frequency band is used as the channel side information for channel B as the selected original channel. This selected original channel B will not be transmitted to decoder but will be represented by the parametric channel side information as calculated by the calculator 14 in FIG. 1.

(39) It is to be noted here that it is not necessary to transmit gain values as channel side information. It is also sufficient to transmit frequency dependent values related to the absolute energy of the selected original channel. Then, the decoder has to calculate the actual energy of the downmix channel and the gain factor based on the downmix channel energy and the transmitted energy for channel B.

(40) FIG. 5 shows a possible implementation of a decoder set up in connection with a transform-based perceptual audio encoder. Compared to FIG. 2, the functionalities of the entropy decoder and inverse quantizer 50 (FIG. 5) will be included in block 24 of FIG. 2. The functionality of the frequency/time converting elements 52a, 52b (FIG. 5) will, however, be implemented in item 36 of FIG. 2. Element 50 in FIG. 5 receives an encoded version of the first or the second downmix signal Lc′ or Rc′. At the output of element 50, an at least partly decoded version of the first and the second downmix channel is present which is subsequently called channel A. Channel A is input into a frequency band selector 54 for selecting a certain frequency band from channel A. This selected frequency band is weighted using a multiplier 56. The multiplier 56 receives, for multiplying, a certain gain factor g.sub.B, which is assigned to the selected frequency band selected by the frequency band selector 54, which corresponds to the frequency band selector 40 in FIG. 4 at the encoder side. At the input of the frequency time converter 52a, there exists, together with other bands, a frequency domain representation of channel A. At the output of multiplier 56 and, in particular, at the input of frequency/time conversion means 52b there will be a reconstructed frequency domain representation of channel B. Therefore, at the output of element 52a, there will be a time domain representation for channel A, while, at the output of element 52b, there will be a time domain representation of reconstructed channel B.

(41) It is to be noted here that, depending on the certain implementation, the decoded downmix channel Lc or Rc is not played back in a multi-channel enhanced decoder. In such a multi-channel enhanced decoder, the decoded downmix channels are only used for reconstructing the original channels. The decoded downmix channels are only replayed in lower scale stereo-only decoders.

(42) To this end, reference is made to FIG. 9, which shows the advantageous implementation of the present invention in a surround/mp3 environment. An mp3 enhanced surround bitstream is input into a standard mp3 decoder 24, which outputs decoded versions of the original downmix channels. These downmix channels can then be directly replayed by means of a low level decoder. Alternatively, these two channels are input into the advanced joint stereo decoding device 32 which also receives the multi-channel extension data, which are advantageously input into the ancillary data field in a mp3 compliant bitstream.

(43) Subsequently, reference is made to FIG. 7 showing the grouping of the selected original channel and the respective downmix channel or combined downmix channel. In this regard, the right column of the table in FIG. 7 corresponds to channel A in FIGS. 3A, 3B, 4 and 5, while the column in the middle corresponds to channel B in these figures. In the left column in FIG. 7, the respective channel side information is explicitly stated. In accordance with the FIG. 7 table, the channel side information I.sub.i for the original left channel L is calculated using the left downmix channel Lc. The left surround channel side information Is.sub.i is determined by means of the original selected left surround channel Ls and the left downmix channel Lc is the carrier. The right channel side information r.sub.i for the original right channel R are determined using the right downmix channel Rc. Additionally, the channel side information for the right surround channel Rs are determined using the right downmix channel Rc as the carrier. Finally, the channel side information c.sub.i for the center channel C are determined using the combined downmix channel, which is obtained by means of a combination of the first and the second downmix channel, which can be easily calculated in both an encoder and a decoder and which does not require any extra bits for transmission.

(44) Naturally, one could also calculate the channel side information for the left channel e. g. based on a combined downmix channel or even a downmix channel, which is obtained by a weighted addition of the first and second downmix channels such as 0.7 Lc and 0.3 Rc, as long as the weighting parameters are known to a decoder or transmitted accordingly. For most applications, however, it will be advantageous to only derive channel side information for the center channel from the combined downmix channel, i.e., from a combination of the first and second downmix channels.

(45) To show the bit saving potential of the present invention, the following typical example is given. In case of a five channel audio signal, a normal encoder needs a bit rate of 64 kbit/s for each channel amounting to an overall bit rate of 320 kbit/s for the five channel signal. The left and right stereo signals may use a bit rate of 128 kbit/s. Channels side information for one channel are between 1.5 and 2 kbit/s. Thus, even in a case, in which channel side information for each of the five channels are transmitted, this additional data add up to only 7.5 to 10 kbit/s. Thus, the inventive concept allows transmission of a five channel audio signal using a bit rate of 138 kbit/s (compared to 320 (!) kbit/s) with good quality, since the decoder does not use the problematic dematrixing operation. Probably even more important is the fact that the inventive concept is fully backward compatible, since each of the existing mp3 players is able to replay the first downmix channel and the second downmix channel to produce a conventional stereo output.

(46) Depending on the application environment, the inventive method for processing or inverse processing can be implemented in hardware or in software. The implementation can be a digital storage medium such as a disk or a CD having electronically readable control signals, which can cooperate with a programmable computer system such that the inventive method for processing or inverse processing is carried out. Generally stated, the invention therefore, also relates to a computer program product having a program code stored on a machines readable carrier, the program code being adapted for performing the inventive method, when the computer program product runs on a computer. In other words, the invention, therefore, also relates to a computer program having a program code for performing the method, when the computer program runs on a computer.

(47) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.

Compatible multi-channel coding/decoding

Assignee

Inventors

Cpc classification

Classification Explorer

H04S2400/03

ELECTRICITY

Classification Explorer

H04S2420/03

ELECTRICITY

Classification Explorer

G10L19/032

PHYSICS

Classification Explorer

H04S3/02

ELECTRICITY

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/008

ELECTRICITY

International classification

Classification Explorer

G10L19/008

PHYSICS

Classification Explorer

H04S3/02

ELECTRICITY

Classification Explorer

G10L19/032

PHYSICS

Classification Explorer

H04S3/00

ELECTRICITY

Abstract

Claims

Description