Downmixer and method for downmixing at least two channels and multichannel encoder and multichannel decoder
11670307 · 2023-06-06
Assignee
Inventors
- Christian Borss (Erlangen, DE)
- Bernd Edler (Fuerth, DE)
- Guillaume Fuchs (Bubenreuth, DE)
- Jan Buethe (Erlangen, DE)
- Sascha Disch (Fuerth, DE)
- Florin Ghido (Nuremberg, DE)
- Stefan Bayer (Nuremberg, DE)
- Markus Multrus (Nuremberg, DE)
Cpc classification
H04S2400/03
ELECTRICITY
G10L19/008
PHYSICS
H04S3/008
ELECTRICITY
International classification
Abstract
A downmixer for downmixing at least two channels of a multichannel signal having the two or more channels includes: a processor for calculating a partial downmix signal from the at least two channels; a complementary signal calculator for calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and an adder for adding the partial downmix signal and the complementary signal to obtain a downmix signal of the multichannel signal.
Claims
1. A downmixer for downmixing at least two channels of a multichannel signal comprising two or more channels, comprising: a processor for calculating a partial downmix signal from the at least two channels using adding the two or more channels; a complementary signal calculator for calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and an adder for adding the partial downmix signal and the complementary signal to acquire a downmix signal of the multichannel signal.
2. The downmixer of claim 1, wherein the processor is configured to calculate the partial downmix signal so that a predefined energy or amplitude relation between the at least two channels of the multichannel signal and the partial downmix signal is fulfilled, when the at least two channels are in phase and so that an energy loss is created in the partial downmix signal with respect to the at least two channels, when the at least two channels are out of phase, and wherein the complementary signal calculator is configured to calculate the complementary signal so that the energy or amplitude loss of the partial downmix signal is partly or fully compensated by the adding of the partial downmix signal and the complementary signal in the adder.
3. The downmixer of claim 1, wherein the complementary signal calculator is configured to calculate the complementary signal so that the complementary signal comprises a coherence index of less than 0.7 with respect to the partial downmix signal, wherein a coherence index of 0.0 shows a full incoherence and a coherence index of 1.0 shows a full coherence.
4. The downmixer of claim 1, wherein the complementary signal calculator is configured to use, for calculating the complementary signal, one signal of the following groups of signals comprising a first channel of the at least two channels, a second channel of the at least two channels, a difference between the first channel and the second channel, a difference between the second channel and the first channel, a further channel of the multichannel signal, when the multichannel signal comprises more channels than the at least two channels, or a decorrelated first channel, a decorrelated second channel, a decorrelated further channel, a decorrelated difference involving the first channel and the second channel or a decorrelated partial downmix signal.
5. The downmixer of claim 1, wherein the processor is configured for: calculating time or frequency-dependent weighting factors for weighting a sum of the at least two channels in accordance with a predefined energy or amplitude relation between the at least two channels and a sum signal of the at least two channels; and comparing a calculated weighting factor to a predefined threshold; and using the calculated weighting factor for calculating the partial downmix signal, when the calculated weighting factor is in a first relation to the predefined threshold, or when the calculated weighting factor is in a second relation to the predefined threshold being different from the first relation, using the predefined threshold instead of the calculated weighting factor for calculating the partial downmix signal, or when the calculated weighting factor is in a second relation to the predefined threshold being different from the first relation, deriving a modified weighting factor using a modification function, wherein the modification function is so that the modified weighting factor is closer to the predefined threshold than the calculated weighting factor.
6. The downmixer of claim 1, wherein the processor is configured for: calculating time or frequency-dependent weighting factors for weighting a sum of the at least two channels in accordance with a predefined energy or amplitude relation between the at least two channels and a sum signal of the at least two channels; and deriving a modified weighting factor using a modification function, wherein the modification function is so that the modified weighting factor results in an energy of the partial downmix signal being smaller than an energy as defined by the predefined energy relation.
7. The downmixer of claim 1, wherein the processor is configured to weight as sum signal of the at least two channels using time or frequency-dependent weighting factors, wherein the weighting factors W.sub.1 are calculated so that the weighting factors comprise values being in a range of ±20% of values determined based on the following equation for a frequency bin k and a time index n:
8. The downmixer of claim 1, wherein the complementary signal calculator is configured to use one channel of the at least two channels and to weight the used channel using time or frequency dependent complementary weighting factors W.sub.2, wherein the complementary weighting factors W.sub.2 are calculated so that the complementary weighting factors comprise values being in a range of ±20% of values determined based on the following equation for a frequency bin k and a time index n:
9. The downmixer of claim 1, wherein the complementary signal calculator is configured to use a difference between a first channel of the two or more channels and a second channel of the two or more channels of the multichannel signal and to weight the difference using time and frequency dependent complementary weighting factors, wherein the complementary weighting factors are calculated so that the complementary weighting factors comprise values being in the range of ±20% of values determined based on the following equations:
W.sub.2=−p±√{square root over (p.sub.2−q)} where
10. The downmixer of claim 1, wherein the complementary signal calculator is configured to use a difference between a first channel of the two or more channels and a second channel of the two or more channels of the multichannel signal and to weight the difference using time and frequency dependent complementary weighting factors, wherein the complementary weighting factors are calculated so that the complementary weighting factors comprise values being in the range of ±20% of values determined based on the following equations:
W.sub.2=−|p|+√{square root over (p.sup.2−q)} where
11. The downmixer of claim 1, wherein the processor is configured: to calculate a sum signal from the at least two channels; to calculate weighting factors for weighting the sum signal in accordance with a predetermined relation between the sum signal and the at least two channels; to modify calculated weighting factors being higher than a predefined threshold, and to apply the modified weighting factors for weighting the sum signal to acquire the partial downmix signal.
12. The downmixer of claim 1, wherein the processor is configured to modify the calculated weighting factors to be in a range of ±20% of the predefined threshold, or to modify the calculated weighting factors so that the calculated weighting factors comprise values being in a range of ±20% of values determined based on the following equations:
13. A method for downmixing at least two channels of a multichannel signal comprising two or more channels, comprising: calculating a partial downmix signal from the at least two channels using adding the two or more channels; calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and adding the partial downmix signal and the complementary signal to acquire a downmix signal of the multichannel signal.
14. A multichannel encoder, comprising: a parameter calculator for calculating multichannel parameters from at least two channels of a multichannel signal comprising the two or more than two channels, and a downmixer of claim 1; and an output interface for outputting or storing an encoded multichannel signal comprising one or more downmix signals and/or the multichannel parameters.
15. A method for encoding a multichannel signal, comprising: calculating multichannel parameters from at least two channels of a multichannel signal comprising two or more than two channels; downmixing in accordance with the method of claim 13; and outputting or storing an encoded multichannel signal comprising the one or more downmix signals and the multichannel parameters.
16. A non-transitory digital storage medium having a computer program stored thereon to perform the method for downmixing at least two channels of a multichannel signal comprising two or more channels, comprising: calculating a partial downmix signal from the at least two channels using adding the two or more channels; calculating a complementary signal from the multichannel signal, the complementary signal being different from the partial downmix signal; and adding the partial downmix signal and the complementary signal to acquire a downmix signal of the multichannel signal, when said computer program is run by a computer.
17. A non-transitory digital storage medium having a computer program stored thereon to perform the method for encoding a multichannel signal, comprising: calculating multichannel parameters from at least two channels of a multichannel signal comprising two or more than two channels; downmixing in accordance with the method as claimed in claim 13; and outputting or storing an encoded multichannel signal comprising one or more downmix signals and the multichannel parameters, when said computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
DETAILED DESCRIPTION OF THE INVENTION
(26)
(27) Generally, however, the downmix signal has fewer channels than are included in the multichannel signal 12. Thus, when the multichannel signal has, for example, five channels, the downmix signal may have four channels, three channels, two channels or a single channel.
(28) The downmix signal with one or two channels is advantageous as compared to a downmix signal having more than two channels. In the case of a two channel signal as the multichannel signal 12, the downmix signal 40 only has a single channel.
(29) In an embodiment, the processor 10 is configured to calculate the partial downmix signal 14 so that the predefined energy-related or amplitude-related relation between the at least two channels and the partial downmix signal is fulfilled, when the at least two channels are in phase and so that an energy loss is created in the partial downmix signal with respect to the at least two channels, when the at least two channels are out of phase. Embodiments and examples for the predefined relation are that the amplitudes of the downmix signal are in a certain relation to the amplitudes of the input signals or the subband-wise energies, for example, of the downmix signal are in a predefined relation to the energies of the input signals. One particularly interesting relation is that the energy of the downmix signal either over the full bandwidth or in subbands is equal to an average energy of the two downmix signals or the more than two downmix signals. Thus, the relation can be with respect to energy, or with respect to amplitude. Furthermore, the complementary signal calculator 20 of
(30) Generally, embodiments are based on the controlled energy or amplitude-equalization of the sum signal mixed with the complementary signal also derived from the input channels.
(31) Embodiments are based on a controlled energy or amplitude-equalization of the sum signal mixed with a complementary signal also derived from the input channels. The energy-equalization of the sum signal is controlled for avoiding problems at the singularity point but also to minimize significantly signal impairments due to large fluctuations of the gain. The complementary signal is there to compensate the remaining energy loss or at least a part of it. The general form of the new downmix can be expressed as
M[k,n]=W.sub.1[k,n](L[k,n]+R[k,n])+W.sub.2[k,n]S[k,n]
(32) where the complementary signal S[k,n] are ideally orthogonal as much as possible to the sum signal, but can be in practice chosen as
S[k,n]=L[k,n]
or
S[k,n]=R[k,n]
or
S[k,n]=L[k,n]−R[k,n].
(33) In all cases, the downmixing generates first the sum channel L+R as it is done in conventional passive and active downmixing approaches. The gain W.sub.1 [k, n] aims at equalizing the energy of the sum channel for either matching the average energy or the average amplitude of the input channels. However, unlike conventional active downmixing approaches, W.sub.1[k, n] is limited to avoid instability problems and to avoid that the energy relations are restored based on an impaired sum signal.
(34) A second mixing is done with the complementary signal. The complementary signal is chosen such that its energy doesn't vanish when L[k, n] and R[k, n] are out-of-phase. W.sub.2[k, n] compensates the energy-equalization due to the limitation introduced in W.sub.1[k, n].
(35) As illustrated, the complementary signal calculator 20 is configured to calculate the complementary signal so that the complementary signal is different from the partial downmix signal. In quantities, it is advantageous that a coherence index of the complementary signal is less than 0.7 with respect to the partial downmix signal. In this scale, a coherence index of 0.0 shows a full incoherence and a coherence index of 1.0 shows a full coherence. Thus, a coherence index of less than 0.7 has proven to be useful so that the partial downmix signal and the complementary signal are sufficiently different from each other. However, coherence indices of less than 0.5 and even less than 0.3 are more advantageous.
(36)
(37) In an embodiment illustrated in
(38) The output of the complementary signal selector 23 is input into a weighting factor calculator 24. The weighting factor calculator additionally typically receives the two or more signals to be combined by the processor 10 and the weighting factor calculator calculates weights W.sub.2 illustrated at 26. Those weights together with the signal used and determined by the complementary signal selector 23 are input into the weighter 25, and the weighter then weights the corresponding signal output from block 23 using the weighting factors from block 26 to finally obtain the complementary signal 22.
(39) The weighting factors can only be time-dependent, so that for a certain block or frame in time, a single weighting factor W.sub.2 is calculated. In other embodiments, however, it is advantageous to use time and frequency dependent weighting factors W.sub.2 so that, for a certain block or frame of the complementary signal, not only a single weighting factor for this time block is available, but a set of weighting factors W.sub.2 for a set of different frequency values or spectral bins of the signal generated or selected by block 23.
(40) A corresponding embodiment for time and frequency dependent weighting factors not only for usage of the complementary signal calculator 20, but also for usage of the processor 10 is illustrated in
(41) Particularly,
(42) The time-spectrum converter 60 is configured for applying an FFT and, advantageously, an overlapping FFT so that the sequence of spectra obtained by block 60 are related to overlapping blocks of the input channels. However, non-overlapping spectral conversion algo-rithms and other conversions apart from an FFT such as DCT or so can be used as well.
(43) Particularly, the processor 10 of
(44) The complementary signal calculator 20 of
(45) Furthermore, the processor 10 of
(46) The adder 30 outputs the downmix signal 40. The downmix 40 can be used in several different occurrences. One way to use the downmix signal 40 is to input it into a frequency domain downmix encoder 64 illustrated in
(47) In embodiments, the processor 10 is configured for calculating time or frequency-dependent weighting factors W.sub.1 as illustrated by block 15 in
(48) The embodiment in
(49) In a further embodiment, the procedure in
(50) In an advantageous embodiment illustrated in
M[k,n]=W.sub.1[k,n](L[k,n]+R[k,n])+W.sub.2[k,n]L[k,n]
(51) where
(52)
(53) In the above equation, A is a real valued constant advantageously being equal to the square root of 2, but A can have different values between 0.5 or 5 as well. Depending on the application, even values different from the above mentioned values can be used as well.
(54) Given that
|L[k,n]+R[k,n]|≤|L[k,n]|+|R[k,n]|,
W.sub.1[k,n] and W.sub.2[k,n] are positive and W.sub.1[k,n] is limited to
(55)
or e.g. 0.5.
(56) The mixing gains can be computed bin-wise for each index k of the STFT as described in the previous formulas or can be computed band-wise for each non-overlapping sub-band gathering a set of indices b of the STFT. The gains are calculated based on the following equation:
(57)
(58) Since the energy preservation during the equalization is not a hard constraint, the energy of the resulting downmix signal varies compared the average energy of the input channel. The energy relation depends on the ILD and IPD as illustrated in
(59) In contrast to the simple active downmixing method, which preserves a constant relation between the output energy and the average energy of the input channels, the new downmix signal does not show any singularity as illustrated in
(60) Listening test results confirm that the new down-mix method results in significantly less instabilities and impairments for a large range of stereo signals than conventional active downmixing.
(61) In this context,
(62) Compared to the conventional technology illustrated in
(63)
M[k,n]=W.sub.1[k,n](L[k,n]+R[k,n])+W.sub.2[k,n](L[k,n]−R[k,n])
(64) where the set of gains W.sub.1[k,n] and W.sub.2[k,n] are computed such that the energy relation between the down-mixed signal and the input channels holds in every condition.
(65) First the gain W.sub.1[k,n] is computed for equalizing the energy till a given limit, where A is again a real valued number equal to √{square root over (2)} or different from this value:
(66)
(67) As a consequence, the gain W.sub.1 [k, n] of the sum signal is limited to the range [0, 1] as shown in
(68) If the two channels have an IPD greater than pi/2, W.sub.1 can no more compensate for the loss of energy, and it will be then coming from the gain W.sub.2. W.sub.2 is computed as one of the roots of the following quadratic equation:
(69)
(70) The roots of the equation are given by:
W.sub.2=−p+√{square root over (p.sub.2−q)},
where
(71)
(72) One of the two roots can be then selected. For both roots, the energy relation is preserved for all conditions as shown in
(73) If the two channels have an IPD greater than pi/2, W.sub.1 can no more compensate for the loss of energy, and it will be then coming from the gain W.sub.2. W.sub.2 is computed as one of the roots of the following quadratic equation:
(74)
(75) The roots of the equation are given by:
W.sub.2=−p±√{square root over (p.sup.2−q)},
where
(76)
(77) One of the two roots can be then selected. For both roots, the energy relation is preserved for all conditions as shown in
(78) Advantageously, the root with the minimum absolute value is adaptively selected for W.sub.2[k,n].. Such an adaptive selection will result in a switch from one root to another for ILD=0 dB, which once again can create a discontinuity.
(79) In contrast to the state-of-the art, this approach solves the comb-filtering effect of the downmix and spectral bias without introducing any singularity. It maintains the energy relations in all conditions but introduces more instabilities compared to the advantageous embodiment.
(80) Thus,
(81)
(82) Furthermore,
(83)
(84) However,
(85)
(86) The downmixing is given by;
M=W.sub.1[k](L[k]+R[k])+W.sub.2[k](L[k]−R[k])
(87) Where
(88)
(89) In the equation for x, an alternative implementation is to use the denominator without a square root.
(90) In this case the quadratic equation to solve is:
(91)
(92) This time the gain W.sub.2 is not exactly taken as one of the roots of the quadratic equation but rather:
W.sub.2=−|p|+√{square root over (p.sup.2−q)}
where
(93)
(94) As a result, the energy relation is not preserved all the time as shown in
(95) Thus,
(96)
(97)
(98) Although the preceding description and certain FIGS. provide detailed equations, it is to be noted that advantages are already obtained even when the equations are not calculated exactly, but when the equations are calculated, but the results are modified. Particularly, the functionalities of the first weighting factor calculator 15 and the second weighting factor calculator 24 of
(99)
(100) An inventively encoded audio signal can be stored on a digital storage medium or a non-transitory storage medium or can be transmitted on a transmission medium such as a wire-less transmission medium or a wired transmission medium such as the Internet.
(101) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
(102) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
(103) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(104) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(105) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier or a non-transitory storage medium.
(106) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(107) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
(108) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(109) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(110) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(111) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
(112) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and com-positions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
(113) [1] U.S. Pat. No. 7,343,281 B2, “PROCESSING OF MULTI-CHANNEL SIGNALS”, Koninklijke Philips Electronics N. V., Eindhoven (NL) [2] Samsudin, E. Kumiawati, Ng Boon Poh, F. Sattar, and S. George, “A Stereo to Mono Downmixing Scheme for MPEG-4 Parametric Stereo Encoder,” in IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 5, 2006, pp. 529-532. [3] T. M. N. Hoang, S. Ragot, B. Kövesi, and P. Scalart, “Parametric Stereo Extension of ITU-T G. 722 Based on a New Downmixing Scheme,” IEEE International Workshop on Multimedia Signal Processing (MMSP) (2010). [4] W. Wu, L. Miao, Y. Lang, and D. Virette, “Parametric Stereo Coding Scheme with a New Downmix Method and Whole Band Inter Channel Time/Phase Differences,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 556-560. [5] Alexander Adami, Emanuël A. P. Habets, Jürgen Herre, “DOWN-MIXING USING COHERENCE SUPPRESSION”, 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) [6] Vilkamo, Juha; Kuntz, Achim; Füg, Simone, “Reduction of Spectral Artifacts in Multi-channel Downmixing with Adaptive Phase Alignment”, AES Aug. 22, 2014