Audio decoder for audio channel reconstruction
11647333 · 2023-05-09
Assignee
Inventors
- Heiko Purnhagen (Sundbyberg, DE)
- Lars VILLEMOES (Järfälla, SE)
- Jonas Engdegard (Ekerö, SE)
- Jonas Roeden (Solna, SE)
- Kristofer Kjoerling (Solna, SE)
Cpc classification
H04S2400/03
ELECTRICITY
H04S3/02
ELECTRICITY
H04S5/00
ELECTRICITY
G10L19/008
PHYSICS
G10L19/167
PHYSICS
H04S2400/01
ELECTRICITY
International classification
G10L19/008
PHYSICS
G10L19/02
PHYSICS
H04S3/02
ELECTRICITY
Abstract
A method and apparatus for reconstructing N audio channels from M audio channels is disclosed. The method includes receiving a bitstream containing an encoded audio signal representing the M audio channels and decoding the encoded audio signal to obtain a frequency domain representation of the M audio channels. The method further includes extracting a parameter from the bitstream and reconstructing at least one of the N audio channels using the parameter. The parameter represents an angle between two signals, at least one of which is included in the M audio channels.
Claims
1. A method performed in an audio decoder for reconstructing a plurality of original audio channels from two or more audio channels, the method comprising: receiving a bitstream containing a parameter φ and the two or more audio channels; decoding the two or more audio channels to obtain a frequency domain representation of the two or more audio channels; extracting the parameter φ from the bitstream; and reconstructing at least one of the plurality of original audio channels from the two or more audio channels based on:
a′=m cos φ+s sin φ, wherein a′ is at least one reconstructed audio channel, m is one of the two or more audio channels, s is derived from a weighted or unweighted combination of the plurality of original audio channels, the parameter φ is an angle representing amounts of signal m and s present in a′, and wherein m and s are decorrelated.
2. The method of claim 1, wherein the parameter φ is quantized.
3. The method of claim 1, further comprising denormalizing the at least one reconstructed audio channel by multiplying the at least one reconstructed audio channel by a square root of an energy of m or s.
4. A non-transitory computer readable medium comprising instructions that when executed by a processor perform the method of claim 1.
5. An apparatus for reconstructing a plurality of original audio channels from two or more audio channels, the apparatus comprising: an input interface for receiving a bitstream containing a parameter φ and a representation of the two or more audio channels; a decoder for obtain a frequency domain representation of the two or more audio channels; an extractor for obtaining the parameter φ from the bitstream; and a reconstructor for reconstructing at least one of the plurality of original audio channels from the two or more audio channels based on:
a′=m cos φ+s sin φ, wherein a′ is at least one reconstructed audio channel, m is one of the two or more audio channels, s is derived from a weighted or unweighted combination of the plurality of original channels, the parameter φ is an angle representing amounts of signal m and s present in a′, and wherein m and s are decorrelated.
6. The apparatus of claim 5, wherein the parameter φ is quantized.
7. The apparatus of claim 5, further comprising a denormalizer for multiplying the at least one reconstructed audio channel using a square root of an energy of m or s.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) These and other objects and features of the present invention will become clear from the following description taken in conjunction with the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DESCRIPTION OF PREFERRED EMBODIMENTS
(15) The below-described embodiments are merely illustrative for the principles of the present invention on multi-channel representation of audio signals. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
(16) In the following description of the present invention outlining how to parameterize IID and ICC parameters, and how to apply them in order to re-create a multi-channel representation of audio signals, it is assumed that all referred signals are subband signals in a filterbank, or some other frequency selective representation of a part of the whole frequency range for the corresponding channel. It is therefore understood, that the present invention is not limited to a specific filterbank, and that the present invention is outlined below for one frequency band of the subband representation of the signal, and that the same operations apply to all of the subband signals.
(17) Although a balance parameter is also termed to be an “inter-channel intensity difference (IID)” parameter, it is to be emphasized that a balance parameter between a channel pair does not necessarily has to be the ratio between the energy or intensity in the first channel of the channel pair and the energy or intensity of the second channel in the channel pair. Generally, the balance parameter indicates the localization of a sound source between the two channels of the channel pair. Although this localization is usually given by energy/level/intensity differences, other characteristics of a signal can be used such as a power measure for both channels or time or frequency envelopes of the channels, etc.
(18) In
(19) Assuming that we define the expectancy operator as
(20)
and thus the energies for the channels outlined above can be defined according to (here exemplified by the left surround channel):
A=E[a.sup.2(t)].
(21) The five channels are on the encoder side down-mixed to a two channel representation or a one channel representation. This can be done in several ways, and one commonly used is the ITU down-mix defined according to:
(22) The 5.1 to two channel down-mix:
l.sub.d(t)=αb(t)+βa(t)+π(t)+δf(t)
r.sub.d(t)=αd(t)+βe(t)+π(t)+δf(t)
(23) And the 5.1 to one channel down-mix:
(24)
(25) Commonly used values for the constants α, β, γ and δ are α=1,
(26)
(27) The IID parameters are defined as energy ratios of two arbitrarily chosen channels or weighted groups of channels. Given the energies of the channels outlined above for the 5.1 channel configuration several sets of IID parameters can be defined.
(28)
(29) In an ITU recommended down-mix, α is set to 1, β and γ are set to be equal, and equal to the square root of 0.5, and δ is set to 0. Generally, the factor α can vary between 1.5 and 0.5. Additionally, the factors β, and γ can be different from each other, and vary between 0 and 1. The same is true for the low frequency enhancement channel f(t). The factor δ for this channel can vary between 0 and 1. Additionally, the factors for the left-down mix and the right-down mix do not have to be equal to each other. This becomes clear, when a non-automatic down-mix is considered, which is, for example, performed by a sound engineer. The sound engineer is more directed to perform a creative down-mix rather than a down-mix, which is guided by any mathematic laws. Instead, the sound engineer is guided by his own creative feeling. When this “creative” down-mixing is recorded by a certain parameter set, it will be used in accordance with the present invention by an inventive up-mixer as shown in
(30) When a linear down-mix has been performed as in
(31) Given the 5.1 channel configuration outlined in
(32) The present invention defines IID parameters that apply to all these channels, i.e. the four channel subset of the 5.1. channel configuration has a corresponding subset within the IID parameter set describing the 5.1 channels.
(33) The following IID parameter set solves this problem:
(34)
(35) It is evident that the r.sub.1 parameter corresponds to the energy ratio between the left down-mix channel and the right channel down-mix. The r.sub.2 parameter corresponds to the energy ratio between the center channel and the left and right front channels. The r.sub.3 parameter corresponds to the energy ratio between the three front channels and the two surround channels. The r.sub.4 parameter corresponds to the energy ratio between the two surround channels. The r.sub.5 parameter corresponds to the energy ratio between the LFE channel and all other channels.
(36) In
(37) Given the parameterization above and the energy of the transmitted single down-mixed channel:
(38)
the energies of the reconstructed channels can be expressed as:
(39)
(40) Hence the energy of the M signal can be distributed to the re-constructed channels resulting in re-constructed channels having the same energies as the original channels.
(41) The above-preferred up-mixing scheme is illustrated in
(42) When
(43) Given the above IID parameters it is evident that the problem of defining a parameter set of IID parameters that can be used for several channel configurations has been solved as will be obvious from the below. As an example, observing the three channel configuration (i.e. recreating three front channels from one available channel), it is evident that the r.sub.3, r.sub.4 and r.sub.5 parameters are obsolete since the A, E and F channels do not exist. It is also evident that the parameters r.sub.1 and r.sub.2 are sufficient to recreate the three channels from a downmixed single channel since r.sub.1 describes the energy ratio between the left and right front channels, and r.sub.2 describes the energy ratio between the center channel and the left and right front channels.
(44) In the more general case it is easily seen that the IID parameters (r.sub.1 . . . r.sub.5) as defined above apply to all subsets of recreating n channels from m channels where m<n≤6. Observing
(45) The above described scalability feature is illustrated by the table in
(46) The inventive concept is especially advantageous in that the left and right channels can be easily reconstructed from a single balance parameter r.sub.1 without knowledge or extraction of any other balance parameter. To this end, in the equations for B, D in
(47) Alternatively, when only the balance parameter r.sub.2 is considered, the reconstructed channels are the sum between the center channel and the low frequency channel (when this channel is not set to zero) on the one hand and the sum between the left and right channels on the other hand. Thus, the center channel on the one hand and the mono signal on the other hand can be reconstructed using only a single parameter. This feature can already be useful for a simple 3-channel representation, where the left and right signals are derived from the sum of left and right such as by halving, and where the energy between the center and the sum of left and right is exactly determined by the balance parameter r.sub.2.
(48) In this context, the balance parameters r.sub.1 or r.sub.2 are situated in a lower scaling layer.
(49) As to the second entry in the
(50) When the equations in
(51) When a 4-channel representation is to be up-mixed, it is sufficient to only extract parameters r.sub.1, r.sub.2, and r.sub.3 from the parameter data stream. In this context, r.sub.3 could be in a next-higher scaling layer than the other parameter r.sub.1 or r.sub.2. The 4-channel configuration is specially suitable in connection with the super-balance parameter representation of the present invention, since, as it will be described later on in connection with
(52) Thus, the combined channel energy of both surround channels is automatically obtained without any further separate calculation and subsequent combination, as would be the case in a single reference channel set-up.
(53) When 5 channels have to be recreated from a single channel, the further balance parameter r.sub.4 is necessary. This parameter r.sub.4 can again be in a next-higher scaling layer.
(54) When a 5.1 reconstruction has to be performed, each balance parameter is required. Thus, a next-higher scaling layer including the next balance parameter r.sub.5 will have to be transmitted to a receiver and evaluated by the receiver.
(55) However, using the same approach of extending the IID parameters in accordance to the extended number of channels, the above IID parameters can be extended to cover channel configuration s with a larger number of channels than the 5.1 configuration. Hence the present invention is not limited to the examples outlined above.
(56) Now observing the case were the channel configuration is a 5.1 channel configuration this being one of the most commonly used cases. Furthermore, assume that the 5.1. channels are recreated from two channels. A different set of parameters can for this case be defined by replacing the parameters r.sub.3 and r.sub.4 by:
(57)
(58) The parameters q.sub.3 and q.sub.4 represent the energy ratio between the front and back left channels, and the energy ratio between the front and back right channels. Several other parameterizations can be envisioned.
(59) In
(60) The present invention teaches that several parameter sets can be used to represent the multi-channel signals. An additional feature of the present invention is that different parameterizations can be chosen dependent on the type of quantization of the parameters that is used.
(61) As an example, a system using coarse quantization of the parameterization, due to high bit rate constraints, a parameterization should be used that does not amplify errors during the upmixing process.
(62) Observing two of the expressions above for the reconstructed energies in a system that re-creates 5.1 channels from one channel:
(63)
(64) It is evident that the subtractions can yield large variations of the B and D energies due to quite small quantization effects of the M, A, C, and F parameters.
(65) According to the present invention a different parameterization should be used that is less sensitive to quantization of the parameters. Hence, if coarse quantization is used, the r.sub.1 parameter as defined above:
(66)
can be replaced by the alternative definition according to:
(67)
This yields equations for the reconstructed energies according to:
(68)
and the equations for the reconstructed energies of A, E, C and F stay the same as above. It is evident that this parameterization represents a more well-conditioned system from a quantization point of view.
(69) In
(70) Another important noteworthy feature of the present invention is that when observing the parameterization
(71)
it is not only a more well-conditioned system from a quantization point of view. The above parameterization also has the advantage that the parameters used to reconstruct the three front channels are derived without any influence of the surround channels. One could envision a parameter r.sub.2 that describes the relation between the center channel and all other channels. However, this would have the drawback that the surround channels would be included in the estimation of the parameters describing the front channels.
(72) Remembering that the, in the present invention, described parameterization also can be applied to measurements of correlation or coherence between channels, it is evident that including the back channels in the calculation of r.sub.2 can have significant negative influence of the success of re-creating the front channels accurately.
(73) As an example, one could imagine a situation with the same signal in all the front channels, and completely uncorrelated signals in the back channels. This is not uncommon, given that the back channels are frequently used to re-create ambience information of the original sound.
(74) If the center channel is described in relation to all other channels, the correlation measure between the center and the sum of all other channels will be rather low, since the back channels are completely uncorrelated. The same will be true for a parameter estimating the correlation between the front left/right channels, and the back left/right channels.
(75) Hence, we arrive with a parameterization that can reconstruct the energies correctly, but that does not include the information that all front channels were identical, i.e. strongly correlated. It does include the information that the left and right front channels are decorrelated to the back channels, and that the center channel is also decorrelated to the back channels. However, the fact that all front channels are the same is not derivable from such a parameterization.
(76) This is overcome by using the parameterization
(77)
as taught by the present invention, since the back channels are not included in the estimation of the parameters used on the decoder side to re-create the front channels.
(78) The energy distribution between the center channel 103 and the left front 102 and right front 103 channels are indicated by r.sub.2 according to the present invention. The energy distribution between the left surround channel 101 and the right surround channel 105 is illustrated by r.sub.4. Finally, the energy distribution between the left front channel 102 and the right front channel 104 is given by r.sub.1. As is evident all parameters are the same as outlined in
(79)
(80)
(81) In a two-base channel situation, the parameters r.sub.3 and r.sub.4, i.e. the front/back balance parameter and the rear-left/right balance parameter are replaced by two single-sided front/rear parameters. The first single-sided front/rear parameter q.sub.3 can also be regarded as the first balance parameter, which is derived from the channel pair consisting of the left surround channel A and the left channel B. The second single-sided front/left balance parameter is the parameter q.sub.4, which can be regarded as the second parameter, which is based on the second channel pair consisting of the right channel D and the right surround channel E. Again, both channel pairs are independent from each other. The same is true for the center/left-right balance parameter r.sub.2, which have, as a first channel, a center channel C, and as a second channel, the sum of the left and right channels B, and D.
(82) Another parameterization that lends itself well to coarse quantization for a system re-creating 5.1 channels from one or two channel is defined according to the present invention below.
(83) For the one to 5.1 channels:
(84)
(85) And for the two to 5.1 channels case:
(86)
(87) It is evident that the above parameterizations include more parameters than is required from the strictly theoretical point of view to correctly re-distribute the energy of the transmitted signals to the re-created signals. However, the parameterization is very insensitive to quantization errors.
(88) The above-referenced parameter set for a two-base channel set-up, makes use of several reference channels. In contrast to the parameter configuration in
(89) Although several inventive embodiments have been described, in which the channel pairs for deriving balance parameters include only original channels (
(90) In order to be completely safe against such energy variations, an additional level parameter is transmitted for each block and frequency band for every downmix channel in accordance with the present invention. When the balance parameters are based on the original signal rather than the down-mix signal, a single correction factor is sufficient for each band, since any energy correction will not influence a balance situation between the original channels. Even when no additional level parameter is transmitted, any down-mix channel energy variations will not result in a distorted localization of sound sources in the audio image but will only result in a general loudness variation, which is not as annoying as a migration of a sound source caused by varying balance conditions.
(91) It is important to note that care needs to be taken so that the energy M (of the down-mixed channels), is the sum of the energies B, D, A, E, C and F as outlined above. This is not always the case due to phase dependencies between the different channels being down-mixed in to one channel. The energy correction factor can be transmitted as an additional parameter r.sub.M, and the energy of the downmixed signal received on the decoder side is thus defined as:
(92)
(93) In
(94)
(95) There can be the case, for example, that a broadcaster wishes to not transmit the parameter down-mix but the master down-mix from a transmitter to a receiver. Additionally, for upgrading the master down-mix to multi-channel representation, the broadcaster also transmits a parametric representation of the original multi-channel signal. Since the energy (in one band and in one block) can (and typically will) vary between the master down-mix and the parameter down-mix, a relative level parameter r.sub.M is generated in block 900 and transmitted to the receiver as an additional parameter. The level parameter is derived from the master down-mix and the parameter down-mix and is preferably, a ratio between the energies within one block and one band of the master down-mix and the parameter down-mix.
(96) Generally, the level parameter is calculated as the ratio of the sum of the energies (E.sub.orig) of the original channels and the energy of the downmix channel(s), wherein this downmix channel(s) can be the parameter downmix (E.sub.PD) or the master downmix (E.sub.MD) or any other downmix signal. Typically, the energy of the specific downmix signal is used, which is transmitted from an encoder to a decoder.
(97)
(98) Although
(99) Studying the case when re-creating 5.1 channels from 2 channels, the following observation is made.
(100) If the present invention is used with an underlying audio codec as outlined in
(101)
this parameter is implicitly available on the decoder side since the system is re-creating 5.1 channels from 2 channels, provided that the two transmitted channels is the stereo downmix of the surround channels.
(102) However, the audio codec operating under a bit rate constraint may modify the spectral distribution so that the L and R energies as measured on the decoder differ from their values on the encoder side. According to the present invention such influence on the energy distribution of the re-created channels vanishes by transmitting the parameter
(103)
also for the case when reconstruction 5.1 channels from two channels.
(104) If signaling means are provided the encoder can code the present signal segment using different parameter sets and choose the set of IID parameters that give the lowest overhead for the particular signal segment being processed. It is possible that the energy levels between the right front and back channels are similar, and that the energy levels between the front and back left channel are similar but significantly different to the levels in the right front and back channel. Given delta coding of parameters and subsequent entropy coding it can be more efficient to use parameters q.sub.3 and q.sub.4 instead of r.sub.3 and r.sub.4. For another signal segment with different characteristics a different parameter set may give a lower bit rate overhead. The present invention allows to freely switching between different parameter representations in order to minimize the bit rate overhead for the presently encoded signal segment given the characteristics of the signal segment. The ability to switch between different parameterizations of the IID parameters in order to obtain the lowest possible bit rate overhead, and provide signaling means to indicate what parameterization is presently used, is an essential feature of the present invention.
(105) Furthermore, the delta coding of the parameters can be done in either the frequency direction or in the time direction, as well as delta coding between different parameters. According to the present invention, a parameter can be delta coded with respect to any other parameter, given that signaling means are provided indicating the particular delta coding used.
(106) An interesting feature for any coding scheme is the ability to do scalable coding. This means that the coded bitstream can be divided into several different layers. The core layer is decodable by itself, and the higher layers can be decoded to enhance the decoded core layer signal. For different circumstances the number of available layers may vary, but as long as the core layer is available the decoder can produce output samples. The parameterization for the multi-channel coding as outlined above using the r.sub.1 to r.sub.5 parameters lend themselves very well to scalable coding. Hence, it is possible to store the data for e.g. the two surround channels (A and E) in an enhancement layer, i.e. the parameters r.sub.3 and r.sub.4, and the parameters corresponding to the front channels in a core layer, represented by parameters r.sub.1 and r.sub.2.
(107) In
(108) Another important aspect of the present invention is the usage of decorrelators in a multi-channel configuration. The concept of using a decorrelator was elaborated on for the one to two channel case in the PCT/SE02/01372 document. However, when extending this theory to more than two channels several problems arise that the present invention solves.
(109) Elementary mathematics show that in order to achieve M mutually decorrelated signals from N signals, M-N decorrelators are required, where all the different decorrelators are functions that create mutually orthogonal output signals from a common input signal. A decorrelator is typically an allpass or near allpass filter that given an input x(t) produces an output y(t) with E[|y|.sup.2]=E[|x|.sup.2] and almost vanishing cross-correlation E[yx*]. Further perceptual criteria come in to the design of a good decorrelator, some examples of design methods can be to also minimize the comb-filter character when adding the original signal to the decorrelated signal and to minimize the effect of a sometimes too long impulse response at transient signals. Some prior art decorrelators utilizes an artificial reverberator to decorrelate. Prior art also includes fractional delays by e.g. modifying the phase of the complex subband samples, to achieve higher echo density and hence more time diffusion.
(110) The present invention suggests methods of modifying a reverberation based decorrelator in order to achieve multiple decorrelators creating mutually decorrelated output signals from a common input signal. Two decorrelators are mutually decorrelated if their outputs y.sub.1(t) and y.sub.2(t) have vanishing or almost vanishing cross-correlation given the same input. Assuming the input is stationary white noise it follows that the impulse responses h.sub.1 and h.sub.2 must be orthogonal in the sense that E[h.sub.1h.sub.2*] is vanishing or almost vanishing. Sets of pair wise mutually decorrelated decorrelators can be constructed in several ways. An efficient way of doing such modifications is to alter the phase rotation factor q that is part of the fractional delay.
(111) The present invention stipulates that the phase rotation factors can be part of the delay lines in the all-pass filters or just an overall fractional delay. In the latter case this method is not limited to all-pass or reverberation like filters, but can also be applied to e.g. simple delays including a fractional delay part. An all-pass filter link in the decorrelator can be described in the Z-domain as:
(112)
where q is the complex valued phase rotation factor (|q|=1), m is the delay line length in samples and a is the filter coefficient. For stability reasons, the magnitude of the filter coefficient has to be limited to |a|<1. However, by using the alternative filter coefficient a′=−a, a new reverberator is defined having the same reverberation decay properties but with an output significantly uncorrelated with the output from the non-modified reverberator. Furthermore, a modification of the phase rotation factor q, can be done by e.g. adding a constant phase offset, q′=qe.sup.jC. The constant C, can be used as a constant phase offset or could be scaled in a way that it would correspond to a constant time offset for all frequency bands it is applied on. The phase offset constant C, can also be a random value that is different for all frequency bands.
(113) According to the present invention, the generation of n channels from m channels is performed by applying an upmix matrix H of size n×(m+p) to a column vector of size (m+p)×1 of signals
(114)
wherein m are the m downmixed and coded signals, and the p signals in S are both mutually decorrelated and decorrelated from all signals in m. These decorrelated signals are produced from the signals in m by decorrelators. The n reconstructed signals a′, b′, . . . are then contained in the column vector
x′=Hy
The above is illustrated by
(115) Let R=E[xx*] be the correlation matrix of the original signal vector let R′=E[x′x′*] be the correlation matrix of the reconstructed signal. Here and in the following, for a matrix or a vector X with complex entries, X* denotes the adjoint matrix, the complex conjugate transpose of X.
(116) The diagonal of R contains the energy values A, B, C, . . . and can be decoded up to a total energy level from the energy quotas defined above. Since R*=R, there are only n(n−1)/2 different off diagonal cross-correlation values containing information that is to be reconstructed fully or partly by adjusting the upmix matrix H. A reconstruction of the full correlation structure corresponds to the case R′=R. Reconstruction of correct energy levels only correspond to the case where R′ and R are equal on their diagonals.
(117) In the case of n channels from m=1 channel, a reconstruction of the full correlation structure is achieved by using p=n−1 mutually decorrelated decorrelators an upmix matrix H which satisfies the condition
(118)
where M is the energy of the single transmitted signal. Since R is positive semidefinite it is well known that such a solution exists. Moreover, n(n−1)/2 degrees of freedom are left over for the design of H, which are used in the present invention to obtain further desirable properties of the upmix matrix. A central design criterion is that the dependence of H on the transmitted correlation data shall be smooth.
(119) One convenient way of parametrizing the upmix matrix is H=UDV where U and V are orthogonal matrices and D is a diagonal matrix. The squares of the absolute values of D can be chosen equal to the eigenvalues of R/M. Omitting V and sorting the eigenvalues so that the largest value is applied to the first coordinate will minimize the overall energy of decorrelated signals in the output. The orthogonal matrix U is in the real case parameterized by n(n−1)/2 rotation angles. Transmitting correlation data in the form of those angles and the n diagonal values of D would immediately give the desired smooth dependence of H. However, since energy data has to be transformed into eigenvalues, scalability is sacrificed by this approach.
(120) A second method taught by the present invention, consists of separating the energy part from the correlation part in R by defining a normalized correlation matrix R.sub.0 by R=GR.sub.0G where G is a diagonal matrix with the diagonal values equal to the square roots of the diagonal entries of R, that is, √{square root over (A)}, √{square root over (B)} . . . , and R.sub.0 has ones on the diagonal. Let H.sub.0 be is an orthogonal upmix matrix defining the preferred normalized upmix in the case of totally uncorrelated signals of equal energy. Examples of such preferred upmix matrices are
(121)
(122) The upmix is then defined by H=GSH.sub.0/√{square root over (M)}, where the matrix S solves SS*=R.sub.0. The dependence of this solution on the normalized cross-correlation values in R.sub.0 is chosen to be continuous and such that S is equal to the identity matrix I in the case R.sub.0=I.
(123) Dividing the n channels into groups of fewer channels is a convenient way to reconstruct partial cross-correlation structure. According to the present invention, a particular advantageous grouping for the case of 5.1 channels from 1 channel is {a,e},{c},{b,d},{f}, where no decorrelation is applied for the groups {c},{f}, and the groups {a,e},{b,d} are produced by upmix of the same downmixed/decorrelated pair. For these two subsystems, the preferred normalized upmixes in the totally uncorrelated case are to be chosen as
(124)
respectively. Thus, only two of the totality of 15 cross-correlations will be transmitted and reconstructed, namely those between channels {a,e} and {b,d}. In the terminology used above, this is an example of a design for the case n=6, m=1, and p=1. The upmix matrix His of size 6×2 with zeros at the two entries in the second column at rows 3 and 6 corresponding to outputs c′ and f′.
(125) A third approach taught by the present invention for incorporating decorrelated signals is the simpler point of view that each output channel has a different decorrelator giving rise to decorrelated signals s.sub.a, s.sub.b, . . . . The reconstructed signals are then formed as
a′=√{square root over (A/M)}(m cos φ.sub.a+s.sub.a sin φ.sub.a),
b′=√{square root over (B/M)}(m cos φ.sub.b+s.sub.b sin φ.sub.b), etc. . . .
(126) The parameters ϕ.sub.a, ϕ.sub.b, . . . control the amount of decorrelated signal present in output channels a′, b′, . . . . The correlation data is transmitted in form of these angles. It is easy to compute that the resulting normalized cross-correlation between, for instance, channel a′ and b′ is equal to the product cos φ.sub.a cos φ.sub.b. As the number of pairwise cross-correlations is n(n−1)/2 and there are n decorrelators it will not be possible in general with this approach to match a given correlation structure if n>3, but the advantages are a very simple and stable decoding method, and the direct control on the produced amount of decorrelated signal present in each output channel. This enables for the mixing of decorrelated signals to be based on perceptual criteria incorporating for instance energy level differences of pairs of channels.
(127) For the case of n channels from m>1 channels, the correlation matrix R.sub.y=E[yy*] can no longer be assumed diagonal, and this has to be taken into account in the matching of R′=HR.sub.yH* to the target R. A simplification occurs, since R.sub.y has the block matrix structure
(128)
where R.sub.m=E[mm*] and R.sub.s=E[ss*]. Furthermore, assuming mutually decorrelated decorrelators, the matrix R.sub.s is diagonal. Note that this also affects the upmix design with respect to the reconstruction of correct energies. The solution is to compute in the decoder, or to transmit from the encoder, information about the correlation structure R.sub.m of the downmixed signals.
(129) For the case of 5.1 channels from 2 channels a preferred method for upmix is
(130)
where s.sub.1 is obtained from decorrelation of m.sub.1=l.sub.d and s.sub.2 is obtained from decorrelation of m.sub.2=r.sub.d.
(131) Here the groups {a,b} and {d,e} are treated as separate 1-42 channels systems taking into account the pairwise cross-correlations. For channels c and f, the weights are to be adjusted such that
E[|h.sub.31m.sub.1+h.sub.32m.sub.2|.sup.2]=C,
E[|h.sub.61m.sub.1+h.sub.62m.sub.2|.sup.2]=F.
(132) The present invention can be implemented in both hardware chips and DSPs, for various kinds of systems, for storage or transmission of signals, analogue or digital, using arbitrary codecs.
(133) In
(134) Although the present invention has mainly been described with reference to the generation and usage of balance parameters, it is to be emphasized here that preferably the same grouping of channel pairs for deriving balance parameters is also used for calculating inter-channel coherence parameters or “width” parameters between these two channel pairs. Additionally, inter-channel time differences or a kind of “phase cues” can also be derived using the same channel pairs as used for the balance parameter calculation. On the receiver-side, these parameters can be used in addition or as an alternative to the balance parameters to generate a multi-channel reconstruction. Alternatively, the inter-channel coherence parameters or even the inter-channel time differences can also be used in addition to other inter-channel level differences determined by other reference channels. In view of the scalability feature of the present invention as discussed in connection with
(135) Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular a disk or a CD having electronically readable control signals stored thereon, which cooperate with a programmable computer system such that the inventive methods are performed. Generally, the present invention is, therefore, a computer program product with a program code stored on a machine readable carrier, the program code being operative for performing the inventive methods when the computer program product runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer.
(136) While this invention has been described in terms of several preferred embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.