Producing a multichannel sound from stereo audio signals
09820072 · 2017-11-14
Assignee
Inventors
Cpc classification
H04S3/00
ELECTRICITY
H04S5/005
ELECTRICITY
H04S7/302
ELECTRICITY
H04S2400/05
ELECTRICITY
H04S2400/07
ELECTRICITY
International classification
H04S3/00
ELECTRICITY
H04S7/00
ELECTRICITY
Abstract
The invention relates to a method for generating a multichannel audio signal (600, 700) from a stereo audio signal (100, 200), having the following steps: ascertaining a first panning coefficient (310) and a second panning coefficient (320) of the stereo audio signal; ascertaining a direct signal (515) as well as a first surroundings signal (510) and a second surroundings signal (520) from the first audio signal and the second audio signal and from the first panning coefficient and the second panning coefficient; ascertaining a plurality of repanning coefficients (410, 415, 420) from the panning coefficients, each repanning coefficient of the plurality of repanning coefficients being assigned to an audio channel (580, 585, 590) of a plurality of audio channels of the multichannel audio signal; calculating each direct signal using each of the repanning coefficients of the plurality of repanning coefficients; and converting each audio channel into a playback signal (600, 700) of the multichannel audio signal, each playback signal being provided for a respective playback device.
Claims
1. A method for generating a multichannel audio signal from a stereo audio signal, wherein the stereo audio signal comprises: a first audio signal for a left-hand playback device, and a second audio signal for a right-hand playback device; the method comprising the steps of: determining a first power level of the first audio signal and a second power level of the second audio signal; determining the autocorrelation of the first audio signal and the second audio signal, and determining the cross correlation of the first audio signal and the second audio signal; determining a similarity function from the ratio of the expectation value of the cross correlation and the expectation value of the sum of the autocorrelations of the first and second audio signals; determining a first partial similarity function as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the first audio signal; determining a second partial similarity function as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the second audio signal; determining a first panning coefficient and a second panning coefficient of the stereo audio signal, wherein the first panning coefficient is determined on the basis of the first and second partial similarity functions, and on the basis of the similarity function of the first and second audio signals, and the second panning coefficient is determined on the basis of the first panning coefficient, wherein the first panning coefficient and the second panning coefficient are adapted so as to position a phantom sound source in a listening region between the left-hand playback device and the right-hand playback device; determining a direct signal and a first surround signal and a second surround signal from the first audio signal and the second audio signal, and from the first panning coefficient and the second panning coefficient; wherein a fourth panning coefficient and a fifth panning coefficient are determined from the first surround signal and from the second surround signal; wherein the fourth panning coefficient and the fifth panning coefficient are configured so as to determine components of the direct signal contained in the first surround signal and the second surround signal; wherein the first surround signal and the second surround signal are in each case reduced by the components of the direct signal contained in them; wherein the reduced first surround signal is assigned to a left-hand rear sound channel; and wherein the reduced second surround signal is assigned to a right-hand rear sound channel; determining a multitude of repanning coefficients from the panning coefficients, wherein each repanning coefficient of the multitude of repanning coefficients is assigned to a sound channel of a multitude of sound channels of the multichannel audio signal, wherein the repanning coefficients for the multitude of sound channels are adapted so as to position a phantom sound source in a listening region between a multitude of playback devices for the multichannel audio signal, wherein from the multitude of playback devices for the multichannel audio signal in each case one playback device is assigned to a sound channel of the multitude of sound channels; processing of the direct signal with each of the repanning coefficients of the multitude of repanning coefficients in each case, wherein the direct signal processed with a first repanning coefficient, and the additively added first surround signal, is assigned to a first sound channel, the direct signal processed with a second repanning coefficient, is assigned to a second sound channel, and the direct signal processed with a third repanning coefficient, and the additively added second surround signal, is assigned to a third sound channel, and converting each sound channel in each case into a playback signal of the multichannel audio signal, wherein each playback signal is provided in each case for a playback device.
2. The method in accordance with claim 1, further comprising the step of: outputting of each sound channel to one playback device in each case.
3. The method in accordance with one of the claim 2, further comprising the step of: outputting of each sound channel to a recording unit.
4. The method in accordance with claim 3, wherein before the output of each sound channel to one playback device in each case, a power level adjustment of the sound channel takes place such that the power level of the surround signal is reduced relative to that of the processed direct signal.
5. The method in accordance with claim 4, wherein the sum of the first audio signal and the second audio signal is low-pass filtered, and represents a low-frequency sound channel, wherein the low-frequency sound channel is one of the sound channels of the multichannel audio signal.
6. The method in accordance with claim 1, wherein the fourth panning coefficient and the fifth panning coefficient are configured so as to determine components of a sixth surround signal and a seventh surround signal contained in the first surround signal and the second surround signal; wherein the sixth surround signal and the seventh surround signal are in each case assigned to another sound channel of the multichannel audio signal.
7. The method in accordance with claim 1, wherein the fourth panning coefficient and the fifth panning coefficient are iteratively determined from the reduced first surround signal and the reduced second surround signal; wherein before each iteration of the determination of the fourth panning coefficient and the fifth panning coefficient the first surround signal and the second surround signal are reduced by the components of the direct signal contained in them.
8. The method in accordance with claim 7, wherein the first panning coefficient and the second panning coefficient of the first audio signal and the second audio signal are calculated for a multitude of frequencies of the stereo audio signal.
9. The method in accordance with claim 8, wherein the first panning coefficient and the second panning coefficient of the first audio signal and the second audio signal are each recurrently redetermined after a defined period of time.
10. A method for generating a multichannel audio signal from a stereo audio signal, the method comprising the steps: transforming a first audio signal and a second audio signal of the stereo audio signal from the time domain into the frequency domain; determining a first power level of the first audio signal and a second power level of the second audio signal; determining the autocorrelation of the first audio signal and the second audio signal, and determining the cross correlation of the first audio signal and the second audio signal; determining a similarity function from the ratio of the expectation value of the cross correlation and the expectation value of the sum of the autocorrelations of the first and second audio signals; determining a first partial similarity function as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the first audio signal; determining a second partial similarity function as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the second audio signal; determining a first panning coefficient and a second panning coefficient of the stereo audio signal, wherein the first panning coefficient is determined on the basis of the first and second partial similarity functions, and on the basis of the similarity function of the first and second audio signals, and the second panning coefficient is determined on the basis of the first panning coefficient, wherein the first panning coefficient and the second panning coefficient are adapted so as to position a phantom sound source in a listening region between the left-hand playback device and the right-hand playback device; determining a direct signal and a first surround signal and a second surround signal from the first audio signal and the second audio signal, and from the first panning coefficient and the second panning coefficient; wherein a fourth panning coefficient and a fifth panning coefficient are determined from the first surround signal and from the second surround signal; wherein the fourth panning coefficient and the fifth panning coefficient are configured so as to determine components of the direct signal contained in the first surround signal and the second surround signal; wherein the first surround signal and the second surround signal are in each case reduced by the components of the direct signal contained in them; wherein the reduced first surround signal is assigned to a left-hand rear sound channel; and wherein the reduced second surround signal is assigned to a right-hand rear sound channel; determining a multitude of repanning coefficients from the panning coefficients, wherein each repanning coefficient of the multitude of repanning coefficients is assigned to a sound channel of a multitude of sound channels of the multichannel audio signal, wherein the repanning coefficients for the multitude of sound channels are adapted so as to position a phantom sound source in a listening region between a multitude of playback devices for the multichannel audio signal, wherein from the multitude of playback devices for the multichannel audio signal in each case one playback device is assigned to a sound channel of the multitude of sound channels; processing of the direct signal with each of the repanning coefficients of the multitude of repanning coefficients in each case, wherein the direct signal processed with a first repanning coefficient, and the additively added first surround signal, is assigned to a first sound channel, the direct signal processed with a second repanning coefficient, is assigned to a second sound channel, and the direct signal processed with a third repanning coefficient, and the additively added second surround signal, is assigned to a third sound channel, and converting each sound channel in each case into a playback signal of the multichannel audio signal, wherein each playback signal is provided in each case for a playback device; and transforming playback signals from the frequency domain into the time domain.
11. An audio signal processing device for generating a multichannel audio signal from a stereo audio signal, with a first audio signal and a second audio signal, the audio signal processing device comprising: a panning coefficient calculation unit, which is adapted so as to: determine a first panning coefficient and a second panning coefficient from the stereo audio signal, determine a first power level of the first audio signal and a second power level of the second audio signal, determine the autocorrelation of the first audio signal and the second audio signal, and determine the cross correlation of the first audio signal and the second audio signal, determine a similarity function from the ratio of the expectation value of the cross correlation and the expectation value of the sum of the autocorrelations of the first and second audio signals, determine a first partial similarity function as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the first audio signal, determine a second partial similarity function as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the second audio signal; wherein the first panning coefficient is determined on the basis of the first and second partial similarity functions, and on the basis of the similarity function of the first and second audio signals, and the second panning coefficient is determined on the basis of the first panning coefficient; and wherein the first panning coefficient and the second panning coefficient are configured so as to position a phantom sound source in a listening region between a left-hand playback device and a right-hand playback device; a signal extraction unit, which is adapted so as to determine a direct signal and a first surround signal and a second surround signal from the first audio signal and the second audio signal, and from the first panning coefficient and the second panning coefficient, wherein a fourth panning coefficient and a fifth panning coefficient are determined from the first surround signal and from the second surround signal; wherein the fourth panning coefficient and the fifth panning coefficient are configured so as to determine components of the direct signal contained in the first surround signal and the second surround signal; wherein the first surround signal and the second surround signal are in each case reduced by the components of the direct signal contained in them; wherein the reduced first surround signal is assigned to a left-hand rear sound channel; and wherein the reduced second surround signal is assigned to a right-hand rear sound channel; a repanning coefficient calculation unit, which is adapted so as to determine a multitude of repanning coefficients from the panning coefficients, wherein each repanning coefficient of the multitude of repanning coefficients is adapted so as to be assigned to a sound channel of a multitude of sound channels of the multichannel audio signal; wherein the repanning coefficients for the multitude of sound channels are configured so as to position a phantom sound source in a listening region between a multitude of playback devices for the multichannel audio signal; wherein from the multitude of playback devices for the multichannel audio signal one playback device can be assigned in each case to a sound channel of the multitude of sound channels; a processing unit, which is adapted so as to process the direct signal with each of the repanning coefficients of the multitude of repanning coefficients in each case, wherein the direct signal processed with a first repanning coefficient, and the additively added first surround signal can be assigned to a first sound channel, the direct signal processed with a second repanning coefficient can be assigned to a second sound channel, and the direct signal processed with a third repanning coefficient, and the additively added second surround signal, can be assigned to a third sound channel.
12. The audio signal processing device in accordance with claim 11, further comprising: a stereo audio signal transformation unit, which is adapted so as to transform the stereo audio signal from the time domain into the frequency domain; a scaling unit, which is adapted so as to carry out a power level adjustment of each sound channel of the multitude of sound channels; and a multichannel audio signal transformation unit, which is adapted so as to transform the sound channels, power level adjusted by the scaling unit, from the frequency domain into the time domain.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The various embodiments will hereinafter be described in conjunction with the following drawing figures, wherein like numerals denote like elements, and:
(2)
(3)
(4)
DETAILED DESCRIPTION
(5) The following detailed description is merely exemplary in nature and is not intended to limit the disclosed embodiments or the application and uses thereof. Furthermore, there is no intention to be bound by any theory presented in the preceding background detailed description.
(6) The representations in the figures are schematic and not to scale.
(7) Where the same reference numbers are used in the following description of the figures, these indicate the same or similar elements.
(8)
(9) The original audio signal 90 is processed with a first panning coefficient 310 or with a second panning coefficient 320, and is assigned to a first loudspeaker 810 or a second loudspeaker 820 respectively. From the processing of the original audio signal with the panning coefficients 310 or 320 there ensues a first audio signal 110 or a second audio signal 120 respectively, which corresponds to the stereo audio signal, and wherein one or a plurality of phantom sound sources 811, 812 can be positioned in the listening region 890 as a function of their frequencies.
(10) In that the first loudspeaker 810 plays back a signal that deviates from that of the second loudspeaker 820, a listener 1 receives the impression that a phantom sound source 811, 812 is positioned in the listening region 890. The positioning of the phantom sound sources 811, 812 can be controlled via the dimensioning of the panning coefficients 310, 320.
(11) As described above, the determination of individual panning coefficients for various frequencies of the stereo audio signal 100 contributes to the ability to position the signals of various frequencies individually in the listening region 890. This is illustrated by the phantom sound sources 811, 812 that are shown spaced apart from each other.
(12)
(13) Each of the loudspeakers receives its own playback signal. By means of the arrangement of the loudspeakers around the listener a spatial listening experience can be improved relative to that of a stereo audio configuration.
(14)
(15) Firstly the two discrete time input signals, x.sub.L(n) (left-hand stereo audio signal) and x.sub.R(n) (right-hand stereo audio signal), which are sampled with a sampling frequency f.sub.A, are transformed from the time domain into the frequency domain. The signals x.sub.L(n) and x.sub.R(n) are thus transformed to X.sub.L(n,k)=|X.sub.L(n,k)|.e.sup.jφL(n,k) and X.sub.R(n,k)=|X.sub.R(n,k)|.e.sup.jφR(n,k), where n corresponds to the time index and k to the frequency index.
(16) The power levels of the input signals are then calculated. Here the operator E{ } applies to the expectation value of the included argument. Since an audio signal can be described as a random signal and moreover can have a positive or negative sign, its expectation value, which corresponds to the average value (expectation value of the first order) is equal to zero. In contrast the expectation value of the second order of an audio signal corresponds to its autocorrelation function and thus to the average power level. The power level of the left-hand input signal P.sub.XL, is thus determined as:
P.sub.X.sub.
and corresponds to the autocorrelation r.sub.LL(n,k) of the left-hand input signal. The power level of the right-hand input signal P.sub.XR is determined as:
P.sub.X.sub.
and corresponds to the autocorrelation r.sub.RR(n,k) of the right-hand input signal. The cross correlation r.sub.LR(n,k)=r.sub.RL(n,k) between the left-hand and right-hand input signals is given by:
r.sub.LR(n,k)=E{X.sub.L(n,k).Math.X.sub.R(n,k)} (3)
(17) Signal Model:
(18) In order to be able to extract various signal components from a stereo audio signal the definition of a signal model is required in the first instance. This model is described as follows:
X.sub.L(n,k)=a.sub.L(n,k).Math.S(n,k)+N.sub.L(n,k)
X.sub.R(n,k)=a.sub.R(n,k).Math.S(n,k)+N.sub.R(n,k) (4)
(19) The left-hand input signal X.sub.L(n,k) consists of the direct signal S(n,k) and the left-hand ambience signal N.sub.R(n,k), where the direct signal is multiplied by the left-hand panning coefficient a.sub.R(n,k). The right-hand input signal X.sub.R(n,k) likewise consists of the direct signal S(n,k) and the right-hand ambience signal N.sub.R(n,k), where the direct signal is multiplied by the right-hand panning coefficient a.sub.R(n,k). A direct signal is to be understood as any signal emitted directly from a sound source. The ambience signals correspond to the reverberations and reflections of the direct signal in space and are thus essential for providing an impression of space over the stereo panorama. Since the direct signal S(n,k), appropriately weighted, is reproduced via both loudspeakers of a stereo configuration, a phantom sound source 811, 812 x.sub.p arises, which can be positioned anywhere between the two loudspeakers in the listening region 890, as can be seen from
a.sub.L.sup.2(n,k)+a.sub.R.sup.2(n,k)=1 (5)
(20) With the aid of the signal extraction process the two ambience signals, together with the direct signal, are now extracted from a stereo audio signal.
(21) Signal Extraction:
(22) In order to extract from the input signals X.sub.L(n,k) and X.sub.R(n,k) the contained direct signal S(n,k), together with the two ambience signals N.sub.L(n,k) and N.sub.R(n,k), the input signals must simply be multiplied with the extraction matrix A.sup.+(n,k). For the estimated signals ^S(n,k), ^N.sub.L(n,k) and ^N.sub.R(n,k) the following therefore applies:
(23)
(24) The extraction matrix A.sup.+(n,k) is composed of the two panning coefficients a.sub.L(n,k) and a.sub.R(n,k), together with a variable parameter r.
(25) Determination of the Panning Coefficients a.sub.L(n,k) and a.sub.R(n,k):
(26) For the extraction of the signals ^S(n,k), ^N.sub.L(n,k) and ^N.sub.R(n,k) the panning coefficients a.sub.L(n,k) and a.sub.R(n,k) must be determined, in order to be able to calculate the pseudo-inverse matrix A.sup.+(n,k) completely. Since the panning coefficients are contained in the input signals it must be possible to determine these by means of an elegant consideration of the input signals Consideration of the power levels of the input signals is convenient for this purpose. In other words the determination of a first power level of the first audio signal and a second power level of the second audio signal is required for this purpose. Here the autocorrelations r.sub.LL(n,k) and r.sub.RR(n,k) describe the power levels of the input signals X.sub.L(n,k) and X.sub.R(n,k). The cross correlation r.sub.LR(n,k) describes the similarity of the input signals. A particular measure of similarity is given by the normalised cross correlation. For the determination of the panning coefficients a.sub.L(n,k) and a.sub.R(n,k) the cross correlation of the input signals, normalised by the sum of the two autocorrelations, is elegantly used as a so-called similarity function Ψ(n,k):
(27)
(28) In other words the autocorrelations of the first audio signal and the second audio signal are determined, together with determining the cross correlation of the first audio signal and the second audio signal. A similarity function is then determined from the ratio of the cross correlation and the sum of the autocorrelations of the first and second audio signals, as given by equation 8.
(29) Alternatively this last-mentioned step can also be carried out by expressing the autocorrelation of the first audio signal or the second audio signal as the expectation value of the product of the first or the second audio signal with itself. The cross correlation can be expressed as the expectation value of the product of the first audio signal and the second audio signal. Thus the similarly function is determined from the ratio of the expectation value of the product of the first audio signal with the second audio signal and the sum of the expectation value of the product of the first audio signal with itself and the expectation value of the product of the second audio signal with itself.
(30) The factor 2 serves to normalise the process. This ensures that Ψ(n,k) has the value one, if both input signals possess the same power level. The factor can also be neglected for the following calculations. The equations alter accordingly. The use of the cross correlation normalised by the product of the two autocorrelations has not proved to be expedient, since this expression entails the risk of a division by zero, if one of the two input signals and therefore its power level corresponds to zero. In this case a correct determination of the panning coefficients would not be possible.
(31) An elegant consideration of Equation (8) reveals that this cannot be viewed exclusively as a ratio of power levels. If one, in accordance with equations (1) to (3), replaces the correlations by the corresponding expectation values, then one obtains:
(32)
(33) Equations (8) and (9) deliver the same result, however, Equation (9) now offers the opportunity of showing which result actually delivers the normalised cross correlation. If one considers Equation (9) and the signal model from Equation (4), it is now possible to reformulate the input signals in accordance with the signal model. Since the panning coefficients are contained in the signal model, an expression should be found as a function of the panning coefficients. Accordingly, one obtains:
(34)
(35) Here P.sub.S(n,k) is the power level of the direct signal, and P.sub.N(n,k) is the power level of the ambience signals. There is a presumption that P.sub.NL(n,k) as the power level of the left-hand surround signal, and P.sub.NR(n,k) as the power level of the right-hand ambience signal, are equal. They are therefore expressed by P.sub.N(n,k) as the power level of the surround signals.
(36) If one inserts Equations (10) to (12) into Equation (9), one obtains:
(37)
(38) Under the assumption that no directional ambience components are present, P.sub.N(n,k) is equal to zero and Equation (13) simplifies to:
ψ(n,k)=2.Math.a.sub.L(n,k).Math.a.sub.R(n,k) (14)
(39) Replacement of the right-hand panning coefficient via the relationship (5) leads to:
ψ(n,k)=2.Math.a.sub.L(n,k).Math.√{square root over (1−a.sub.L.sup.2(n,k))}. (15)
(40) Thus it is shown that the normalised cross correlation in accordance with Equation (8) delivers an expression that is simply dependent on the panning coefficient. If it is therefore possible to find, by an elegant consideration of power levels, further expressions in which the panning coefficients are contained, then the panning coefficients could finally be determined.
(41) Two partial similarity functions are introduced as further useful functions, since these also consist of a ratio of power levels, and in comparison to Equation (8) are easily varied. The partial similarity functions consist of the cross correlation of the two input signals normalised with respect to the respective autocorrelation. The left-hand partial similarity function is given by:
(42)
and the right-hand function is given by:
(43)
(44) In other words a first partial similarity function is determined as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the first audio signal, see Equation (16). A second partial similarity function is determined as the ratio of the cross correlation of the first and second audio signals and the autocorrelation of the second audio signal, see Equation (17).
(45) In order to find an expression as a function of the panning coefficients, it is in turn desirable to replace autocorrelations by the corresponding expectation values, and for the input signals to use the signal model in accordance with Equation (4). With the assumption that P.sub.N(n,k)=0, this leads to:
(46)
(47) The partial similarity functions accordingly consist of the ratio of the panning coefficients. Equations (18) and (19) can usefully be combined in order thus to find a common expression, which can elegantly be used hereinafter. By formation of the sum or difference of the two partial similarity functions it is possible for the expression that arises to have a common denominator. The formation of the sum has not proved to be expedient. Formation of the difference ΔΨ of the two partial similarity functions and utilisation of relationship (5) leads to:
(48)
(49) A comparison of Equation (20) with Equation (15) reveals that the term a.sub.L(n,k).√(1−a.sub.L.sup.2(n,k)) in the difference of the two partial similarity functions ΔΨ(n,k) can be replaced by the similarity function Ψ(n,k). Thus Equation (20) simplifies to:
(50)
(51) This expression can be solved for a.sub.L(n,k), where the negative solution is omitted. The conditional equation for the left-hand panning coefficient can accordingly be expressed as:
(52)
(53) and can be fully determined from consideration of the power levels of the two input signals X.sub.L(n,k) and X.sub.R(n,k). In accordance with Equation (5) the right-hand panning coefficient is given by:
a.sub.R(n,k)=√{square root over (1−a.sub.L.sup.2(n,k))}. (23)
(54) The first panning coefficient is therefore determined on the basis of a difference of the first and second partial similarity functions, together with the similarity function of the first and second audio signals, and the second panning coefficient is determined on the basis of the first panning coefficient.
(55) It should be noted that in one embodiment the second panning coefficient can be determined, and on this basis the first panning coefficient can then be determined. In the Formula (22) the difference is then to be replaced by a sum of the named operands. Likewise, it should be noted that the difference between the first and second partial similarity functions, necessary for the determination of the panning coefficients, is case-dependent, as is presented further below with reference to Equation (29).
(56) In one embodiment the first panning coefficient is determined on the basis of a product of a difference between the first and second partial similarity functions with a similarity function of the first and second audio signals.
(57) The method, as described above and hereinafter, enables the generation of a multichannel sound on the basis of a stereo audio signal; in other words, the latter is upgraded by means of the method (a so-called up-mix takes place). By means of an elegant consideration of the underlying input signals a resource-saving recalculation can take place, i.e. the method is less computationally intensive, and requires less computing time on a computing processor.
(58) In the calculation of the difference ΔΨ(n,k) between the two partial similarity functions Ψ.sub.L(n,k) and Ψ.sub.R(n,k) the result can be the emergence of undesirable ripple in the signal profile. The panning coefficients a.sub.L(n,k) and a.sub.R(n,k), thereby not determined correctly, would lead in the playback of the extracted signals in a multichannel configuration to fluctuations in the directions of the phantom sound sources. Therefore the difference ΔΨ(n,k) between the two partial similarity functions must be reconsidered once again. If the partial similarity functions Ψ.sub.L(n,k) and Ψ.sub.R(n,k) are described in terms of correlations, then it is true that:
(59)
(60) If the amplitude of one of the two channels of the input signal approaches or is close to zero, ΔΨ(n,k) assumes a value that is very much less than one (r.sub.RR(n,k) approaches zero), or greater than one (r.sub.LL(n,k) approaches zero). In the last instance this is responsible for the emergence of the ripple. With the aid of equations (10) and (11) the product of the two autocorrelations can be written as:
(61)
(62) With the assumption that P.sub.N(n,k)=0, Equation (25) becomes:
r.sub.LL(n,k).Math.r.sub.RR(n,k)=a.sub.L.sup.2(n,k).Math.a.sub.R.sup.2(n,k).Math.P.sub.S.sup.2(n,k). (26)
(63) In a comparison with Equation (12) one discerns that the cross correlation is equal to the root of the product of the two autocorrelations (for P.sub.N(n,k)=0). It is therefore true that:
r.sub.LR(n,k)=√{square root over (r.sub.LL(n,k).Math.r.sub.RR(n,k))}. (27)
(64) The relationship that has been found can be inserted into Equation (24). Accordingly, it is then true that:
(65)
(66) From this, with the above described behaviour of the difference ΔΨ(n,k) in accordance with Equation (24) for the cases r.sub.LL(n,k)≥r.sub.RR(n,k) and r.sub.LL(n,k)<r.sub.RR(n,k), it is possible to find the following corrected expression for ΔΨ(n,k):
(67)
(68) In this manner the profile of the panning coefficients can be smoothed and the appearance of undesirable ripple can be prevented.
(69) With the aid of the panning coefficients as determined, the extraction matrix A.sup.+(n,k) can be fully calculated, and the signal components ^S(n,k), ^N.sub.L(n,k) and ^N.sub.R(n,k) can be extracted in accordance with Equation (6).
(70) Repanning and Distribution of the Extracted Signals:
(71) After the direct signal ^S(n,k) together with the ambience signals ^N.sub.L(n,k) and ^N.sub.R(n,k) have been determined, these must be prepared appropriately for playback via a multichannel loudspeaker system, and distributed to the individual loudspeakers. The direct signal ^S(n,k) is reproduced via all three front loudspeakers (left-hand front, right-hand front, centre front) and in each case is weighted with one of the so-called pairs of panning coefficients. These weighting factors g.sub.1(n,k), g.sub.2(n,k) and g.sub.3(n,k) are panning coefficients with which a pair-wise panning of the direct signal is executed. In this manner the direction of the phantom sound source in the multichannel configuration, taking into account the additional central loudspeaker, is designed to correspond to the direction of the phantom sound source in the original stereo configuration. This direction can be determined with the aid of the panning coefficients a.sub.L(n,k) and a.sub.R(n,k) as determined. The coefficients g.sub.1(n,k) and g.sub.2(n,k) ensure a panning between the left-hand front loudspeaker and the central loudspeaker; the coefficients g.sub.2(n,k) and g.sub.3(n,k) ensure a panning between the central loudspeaker and the right-hand front loudspeaker. If a directional signal is fully panned into the centre, g.sub.1(n,k) and g.sub.3(n,k) are equal to zero, while g.sub.2(n,k) is equal to one. If a directional signal is fully panned to the left (or to the right), g.sub.2(n,k) and g.sub.3(n,k) (or g.sub.1(n,k) and g.sub.2(n,k)) are equal to zero, while g.sub.1(n,k) (or g.sub.3(n,k)) is equal to one. If a direct signal is panned between the loudspeakers, one panning coefficient is always equal to zero, since the pair-wise panning is only executed between the central loudspeaker and the left-hand or right-hand front loudspeaker. Accordingly, it is true that:
g.sub.1(n,k).Math.g.sub.2(n,k).Math.g.sub.3(n,k)=0. (30)
(72) Furthermore, the rule from Equation (5) is maintained and thus:
g.sub.1.sup.2(n,k)+g.sub.2.sup.2(n,k)+g.sub.3.sup.2(n,k)=1 (31)
(73) In order to be able to determine the weighting factors g.sub.1(n,k), g.sub.2(n,k) and g.sub.3(n,k), the angle of the phantom sound source φ(n, k) is firstly determined from the panning coefficients a.sub.L(n,k) and a.sub.R(n,k) as:
(74)
(75) Here φ.sub.0(n,k) is the angle between the respective loudspeaker of a stereo speaker configuration and the centre line originating from the listening position. The angle φ.sub.0(n,k) can be set to 30°. However, any other logical angle is also possible, since in the last instance the latter is eliminated from the calculation and has no influence on the pairs of panning coefficients. The angle φ(n,k) is the angle between the phantom sound source and the centre line originating from the listening position. Equation 12 can also be calculated with the use of sinusoidal terms. However, if the listening position, i.e. the head of the listener, is not exactly aligned, the relationship that is specified is more accurate.
(76) Hereinafter, the two cases φ.sub.0(n,k)≥φ(n,k)≥0° and −φ.sub.0(n,k)≤φ(n,k)<0° must be differentiated. In the first case the direct signal is panned between the left-hand front loudspeaker and the central loudspeaker (g.sub.3(n,k)=0); in the second case it is panned between the right-hand front loudspeaker and the central loudspeaker (g.sub.2(n,k)=0). Moreover, a new angle φ.sub.0,neu(n,k) is introduced with:
(77)
(78) Each angle in the range φ.sub.0(n,k)≥φ(n,k)≥0° is mapped onto the range φ.sub.0,neu(n,k)≥φ(n,k)≥−φ.sub.0,neu(n,k). A phantom sound source that is fully panned to the left accordingly no longer possesses the angle φ.sub.0(n,k), but φ.sub.0,neu(n,k). A phantom sound source positioned centrally between the left-hand front and central loudspeakers no longer possesses the angle ½ φ.sub.0(n,k), but 0°. A phantom sound source positioned in the centre no longer possesses the angle 0°, but −φ.sub.0,neu(n,k).
(79) Each angle in the range −φ.sub.0(n,k)≤φ(n,k)<0° is mapped onto the range φ.sub.0,neu(n,k)≥φ(n,k)≥−φ.sub.0,neu(n,k). A phantom sound source that is positioned fully to the right accordingly no longer possesses the angle −φ.sub.0(n,k), but −φ.sub.0,neu(n,k). A phantom sound source positioned centrally between the right-hand front and central loudspeakers no longer possesses the angle −½ φ.sub.0(n,k), but 0°. From these considerations it is possible to determine the new angle of the phantom sound source φ.sub.neu(n,k). Thus:
(80)
or, formulated with the left-hand panning coefficient:
(81)
(82) With the aid of the angles φ.sub.0,neu(n,k) and φ.sub.neu(n,k), two coefficients a′.sub.L(n,k) and a′.sub.R(n,k), not further defined, can now be calculated, in the first instance independently of the case differentiations; these correspond either to the coefficients g.sub.1(n,k) and g.sub.2(n,k), or to g.sub.2(n,k) and g.sub.3(n,k). Equation (32) applies for both the new angles φ.sub.0,neu(n,k) and φ.sub.neu(n,k), together with the panning coefficients a′.sub.L(n,k) and a′.sub.R(n,k). Thus:
(83)
(84) By means of cross-multiplication one obtains:
(85)
(86) Rearrangement and removal of brackets delivers:
a.sub.L′.Math.(tan(φ.sub.neu)−tan(φ.sub.0,neu))=−a.sub.R′.Math.(tan(φ.sub.0,neu)+tan(φ.sub.neu)) (38)
(87) Rearrangement and replacement of the right-hand panning coefficient via the relationship (5) leads to:
(88)
(89) With the abbreviated form:
(90)
(91) squaring of both sides, multiplying out and rearranging gives:
(92)
(93) Rearrangement and resolution for a′.sub.L(n,k) delivers finally:
(94)
(95) where the negative solution is omitted. In accordance with Equation (5) the coefficient a′.sub.R(n,k) is given by:
(96)
(97) With the aid of the case differentiations and equations (42) and (43) it is now possible to formulate the pairs of panning coefficients as follows:
(98)
(99) If, in the case a.sub.L(n,k)≥√(0.5) the angles φ.sub.neu(n,k) and φ.sub.0,neu(n,k) are identical (phantom sound source fully panned to the left), g.sub.1(n,k) is set to one, in order to avoid a division by zero in Equation (40).
(100) With the calculated pairs of panning coefficients it is now possible to generate the estimated signals ^X.sub.FL(n,k), ^X.sub.FR(n,k) and ^X.sub.C(n,k) for the left-hand front loudspeaker, the right-hand front loudspeaker, and the central loudspeaker. As already stated, the direct signal ^S(n,k) is reproduced via all three loudspeakers and is weighted with the respective coefficients g.sub.1(n,k), g.sub.2(n,k) or g.sub.3(n,k). At the same time the ambience signals {circumflex over (φ)}N.sub.L(n,k) and ^N.sub.R(n,k) are also provided to the left-hand and right-hand front speakers so as to maintain the spatial impression in accordance with the original stereo signal. Also in accordance with the signal model the front channels consist of the panned direct signal and the ambience components. By means of the additional central loudspeaker an essentially more stable and higher quality sound impression is achieved compared with the stereo playback. The central loudspeaker contains just the direct signal panned with g.sub.2(n,k), in order to emphasise the phantom sound source from this direction. Moreover ambience components from this direction are negligible. The three loudspeaker signals ^X.sub.FL(n,k), ^X.sub.FR(n,k) and ^X.sub.C(n,k) are accordingly given by:
(101)
(102) Post-Scaling:
(103) For power level adjustment of the loudspeaker signals ^X.sub.FL(n,k), ^X.sub.FR(n,k) and ^X.sub.C(n,k) generated in accordance with Equation (47) a post-scaling process is executed, primarily to reduce the ambience components in the front channels and to adapt the waveform. By this means any dominance of the ambience components over the panned direct signals is to be prevented, as are any falsely arising phantom sound sources. The scaled signals ^X′.sub.FL(n,k), ^X′.sub.FR(n,k) and ^X′.sub.C(n,k) are given by:
(104)
(105) Here the power levels P.sub.^XFL(n,k), P.sub.^XFR(n,k) and P.sub.^XC(n,k) are the power levels of the loudspeaker signals estimated in accordance with the conditional Equation (47), and the power levels P.sub.XFL(n,k), P.sub.XFR(n,k) and P.sub.XC(n,k) are the actual power levels of the individual channels. The generated signals are thus to be scaled to the power levels of the actual signals. By means of appropriate consideration of the respective signals with the expression of the power levels via expectation values and the utilisation of signal models and conditional equations, expressions can elegantly be found for the scaling factors.
(106) In order to be able to determine the factors for the scaling process, a reformulation of Equation (6) has proved to be appropriate. Thus:
(107)
(108) The elements of the extraction matrix A.sup.+(n,k) are now expressed accordingly in terms of the coefficients w.sub.1(n,k) to w.sub.6(n,k). The estimated power level of the left-hand front channel P.sub.^XFL(n,k) is given by the second order expectation value of the left-hand front loudspeaker signal ^X.sub.FL(n,k), which is replaced by the conditional Equation (47):
(109)
(110) The three second order expectation values should in the interests of clarity be specified individually. For this purpose the conditional equations for ^S(n,k), ^ N.sub.L(n,k) and ^N.sub.R(n,k) are inserted in accordance with Equation (49), and the input signals are replaced by the signal model in accordance with Equation (4). Thus:
(111)
(112) The expression for E{^N.sup.2.sub.L(n,k)} can be derived directly from Equation (51), since the conditional equations for ^S(n,k) and ^N.sub.L(n,k) differ only in terms of the coefficients (w.sub.3(n,k) and w.sub.4(n,k), instead of w.sub.1(n,k) and w.sub.2(n,k)), it is just the factor g.sub.1.sup.2(n,k) that is not applicable. Thus:
(113)
(114) The expression E{2g.sub.1(n,k) ^S(n,k) ^N.sub.L(n,k)} is given by:
(115)
(116) The estimated power level of the right-hand front channel P.sub.^XFR(n,k) is given by the second order expectation value of the right-hand front loudspeaker signal ^X.sub.FR(n,k), which is replaced by the conditional equation (47):
(117)
(118) The three second order expectation values should in the interests of clarity be specified individually and can be derived directly from Equations (51) to (53), since it is just the coefficients in the conditional equations that differ. Thus:
(119)
(120) The estimated power level of the central channel P.sub.C(n,k) is given by the second order expectation value of the central loudspeaker signal ^X.sub.C(n,k), which is replaced by the conditional equation (47): The expression E{(g.sub.2(n,k) ^S(n,k)).sup.2} can be derived directly from Equation (51) or (55), since it is just the pair of panning coefficients that differ. Thus:
(121)
(122) The actual power levels of the loudspeaker signals P.sub.XFL(n,k), P.sub.XFR(n,k) and P.sub.XC(n,k) are given by the second order expectation values of the actual loudspeaker signals following Equation (47). The actual signals are given by:
(123)
(124) The power levels P.sub.XFL(n,k), P.sub.XFR(n,k) and P.sub.XC(n,k) are accordingly determined as:
(125)
(126) In order to be able to fully determine the equations (50) to (62), the power level of the direct signal P.sub.S(n,k), together with the power levels of the ambience signals P.sub.N(n,k), must also be determined. With P.sub.NL(n,k) as the power level of the left-hand surround signal, and P.sub.NR(n,k) as the power level of the right-hand ambience signal, under the assumption that the power levels of the two ambience signals are equal it is true that:
P.sub.N.sub.
(127) The power levels P.sub.S(n,k) and P.sub.N(n,k) can be determined in turn from an elegant consideration of the power levels of the input signals. Thus it is possible to derive these from the eigenvalues λ.sub.1(n,k) and λ.sub.2(n,k) of the covariance matrix R(n,k). The covariance matrix R(n,k) consists of the two autocorrelations (r.sub.LL(n,k) and r.sub.RR(n,k)), together with the cross correlation (r.sub.LR(n,k)=r.sub.RL(n,k)) of the input signals, and is composed as follows:
(128)
(129) For the eigenvalues it is true that:
(130)
(131) If one elegantly replaces the correlations by expectation values and expresses the input signals in terms of the signal model, as is the case in Equations (10) to (12), it can be shown that the eigenvalue λ.sub.2(n,k) corresponds directly to the power level of the ambience signals P.sub.N(n,k). The eigenvalue λ.sub.1(n,k) corresponds to the sum of the power level of the direct signal P.sub.S(n,k) and the power level of the ambience signals P.sub.N(n,k). The power level of the direct signal is thus given by:
(132)
and the power level of the ambience signals is given by:
(133)
(134) The power level of the ambience signals and the power level of the direct signal thus ensue from an elegant consideration of the power levels of the input signals.
(135) Subsequent Processing of the Ambience Signals:
(136) In the playback of non-correlated ambience signals via the surround loudspeakers, by virtue of the lateral phantom sound sources that are generated the result can be an enhancement of the stereo panorama via the two outer front loudspeakers. However, this contributes significantly to the improvement of the spatial listening experience. In order to reduce this effect together with the dominance of the ambience signals, the surround channels can therefore in general be lowered by 2 dB.
(137) The extracted ambience signals still contain a certain component of the direct signal. The result can likewise be an enhancement of the stereo panorama. In order to reduce the direct component S.sub.A(n,k) that is still contained in the ambience signals, the ambience signals are supplied to the decoder as input signals. Accordingly the following modified signal model is obtained:
{circumflex over (N)}.sub.L=(n,k)=a.sub.L.sub.
{circumflex over (N)}.sub.R(n,k)=a.sub.R.sub.
(138) Each ambience signal consists accordingly in turn of an ambience component and a direct component provided with the respective panning coefficient. The extraction of the direct component ^S.sub.A(n,k) contained in the ambience signals follows from Equations (6) and (7). With the variable parameter r.sub.A the following is obtained:
(139)
The panning coefficients, which must be recalculated, in accordance with Equation (22) are given by:
(140)
(141) Here it is necessary, needless to say, to refer all power level considerations to the ambience signals. With r.sub.LLA(n,k) and r.sub.RRA(n,k) as autocorrelations of the left-hand and right-hand ambience signals, together with the cross correlation r.sub.LRA(n,k) between the two ambience signals, one obtains:
(142)
(143) With the determined direct component ^S.sub.A(n,k), contained in the ambience signals, the reduced ambience signals ^N.sub.LA(n,k) and ^N.sub.RA(n,k) are now determined as:
{circumflex over (N)}.sub.L.sub.
and
{circumflex over (N)}.sub.R.sub.
(144) This process can be iteratively applied as often as required in order to obtain the desired effect in each case. With each iteration step the direct component contained in the ambience signals is reduced.
(145) Also in accordance with Equations (6), (7) and (68) an extraction of the signal components N.sub.NL(n,k) and N.sub.NR(n,k) is possible; these can be used as signals for two additional loudspeakers (for example, the left-hand rear surround and right-hand rear surround of a 7.1 multichannel configuration).
(146) Generation of the Subwoofer Channel:
(147) An explicit signal for the subwoofer of a multichannel configuration system is necessary if the system itself is to be prevented from generating such a signal from all the channels that are available. Some systems generate, for example, no subwoofer signal at all, or no subwoofer signal in a particular configuration (connected subwoofer cable), and are thus dependent upon an explicit signal.
(148) The subwoofer signal X.sub.LFE(n,k) is obtained from low-pass filtering of the two input signals X.sub.L(n,k) and X.sub.R(n,k). For this purpose these are firstly added and adjusted in power level, and are then multiplied by the low-pass transfer function H.sub.TP(k). Thus:
X.sub.LFE(n,k)=√{square root over (0.5)}.Math.(X.sub.L(n,k)+X.sub.R(n,k)).Math.H.sub.TP(k) (76)
(149) With the use of the low-pass the bandwidth of the subwoofer channel is to be reduced. For this purpose the passing frequency f.sub.D is, for example, selected to be f.sub.D=120 Hz, and the blocking frequency f.sub.S is selected to be f.sub.S=160 Hz.
(150) Power Level Normalisation:
(151) In order to ensure that the sum of all the power levels of all the extracted channels is equal to the sum of the power levels of the input signals, an optional power level normalisation is necessary. Here all the loudspeakers signals are scaled in the same manner with the factor q(n,k). This is obtained from the power levels of the input and output signals as:
(152)
(153) The normalised loudspeaker signals are accordingly given by:
(154)
(155) After all the signals have been generated these are transformed from the frequency domain into the time domain.
(156) While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration of the embodiment in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing an exemplary embodiment, it being understood that various changes may be made in the function and arrangement of elements described in an exemplary embodiment without departing from the scope of the embodiment as set forth in the appended claims and their legal equivalents.
LIST OF REFERENCE SIGNS USED
(157) 1 Listener, listener position 10 Stereo audio signal transformation unit: time domain into frequency domain (STFT) 20 Panning coefficient calculation unit 25 Panning coefficient calculation unit 30 Repanning coefficient calculation unit 40 Signal extraction unit 42 Processing unit 45 Signal extraction unit 50 Scaling unit 51, 52 Damping units 60 Switching unit for determining the iteration steps for the calculation of the fourth and fifth surround signals 70 Low-pass filter 80 Multichannel audio signal transformation unit: frequency domain into time domain (ISTFT) 90 Output audio signal for a stereo audio signal 100 Stereo audio signal 110 First audio signal 120 Second audio signal 200 Stereo audio signal transformed into the frequency domain 210 Transformed first audio signal 220 Transformed second audio signal 260 Low-frequency audio signal 310 First panning coefficient of the stereo audio signal 320 Second panning coefficient of the stereo audio signal 360 Fourth panning coefficient 370 Fifth panning coefficient 410 First repanning coefficient 415 Second repanning coefficient 420 Third repanning coefficient 510 First surround signal 515 Direct signal 520 Second surround signal 560 Sixth surround signal 565 Direct signal components contained in the first and second surround signals 570 Seventh surround signal 580 First sound channel 585 Second sound channel 590 Third sound channel 600 Playback signals 610 First playback signal 615 Second playback signal 620 Third playback signal 630 Fourth playback signal 640 Fifth playback signal 700 Sound channel 710 First sound channel 715 Second sound channel 720 Third sound channel 730 Fourth sound channel 740 Fifth sound channel 760 Low-frequency sound channel 811, 812 Phantom sound source 800, 810, 815, 820, 830, 840, 860 Playback devices, loudspeakers 890 Listening region