Multichannel sound reproduction method and device

Abstract

Disclosed are methods for selecting auditory signal components for reproduction by means of one or more supplementary sound reproducing transducers, such as loudspeakers, placed between a pair of primary sound reproducing transducers, such as left and right loudspeakers in a stereophonic loudspeaker setup or adjacent loudspeakers in a surround sound loudspeaker setup. Also disclosed are devices for carrying out the above methods and systems of such devices.

Claims

1. A method for selecting auditory signal components for reproduction in a loudspeaker setup having one or more physical supplementary sound reproducing transducers, such as loudspeakers, placed between a pair of primary sound reproducing transducers, such as left and right loudspeakers in a stereophonic loudspeaker setup or adjacent loudspeakers in a surround sound loudspeaker setup, the method comprising the steps of: (i) specifying an azimuth angle range within which one of said physical supplementary sound reproducing transducers is located or is to be located; (ii) based on said azimuth angle range, determining left and right interaural level difference limits and left and right interaural time difference limits from the binaural impulse responses for a source at each extreme azimuthal angle, respectively; (iii) providing a pair of input signals for said pair of primary sound reproducing transducers; (iv) pre-processing each of said input signals with binaural impulse responses for the pair of primary sound reproducing transducers, thereby providing a pair of pre-processed input-signals; (v) determining interaural level difference and interaural time difference as a function of frequency between said pre-processed signals; (vi) providing those signal components of said input signals that have interaural level differences and interaural time differences in the interval between said left and right interaural level difference limits, and left and right interaural time difference limits, respectively, to the corresponding physical supplementary sound reproducing transducer; and (vii) reproducing said signal components in said physical supplementary sound reproducing transducers.

2. A method according to claim 1, wherein those signal components that have interaural level and time differences outside said limits are provided to said left and right primary sound reproducing transducers, respectively.

3. A method according to claim 1, wherein those signal components that have interaural differences outside said limits are provided as input signals to means for carrying out the method according to claim 1.

4. A method according to claim 1, wherein said binaural impulse responses comprise head-related transfer functions.

5. A method according to claim 1, further comprising determining the coherence between said pair of input signals, and wherein said signal components are weighted by the coherence before being provided to said one or more supplementary sound reproducing transducers.

6. A method according to claim 1, wherein the frontal direction relative to a listener, and hence the respective processing by said pre-processing means is chosen by the listener.

7. A method according to claim 1, wherein the frontal direction relative to a listener, and hence the respective processing by said pre-processing means is controlled by means of head-tracking means attached to a listener.

8. A device for selecting auditory signal components for reproduction in a loudspeaker setup having one or more physical supplementary sound reproducing transducers, such as loudspeakers, placed between a pair of primary sound reproducing transducers, such as left and right loudspeakers in a stereophonic loudspeaker setup or adjacent loudspeakers in a surround sound loudspeaker setup, the device comprising: (i) specification means for specifying an azimuth angle range within which one of said physical supplementary sound reproducing transducers is located or is to be located, (ii) determining means that based on said azimuth angle range determine left and right interaural level difference limits and left and right interaural time difference limits, respectively from the binaural impulse responses for a source at each extreme azimuthal angle; (iii) left and right input terminals providing a pair of input signals for said pair of primary sound reproducing transducers; (iv) pre-processing means for pre-processing each of said input signals provided on said left and right input terminals with binaural impulse responses for the pair of primary sound reproducing transducers, thereby providing a pair of pre-processed input signals; (v) determining means for determining interaural level difference and interaural time difference as a function of frequency between said pre-processed input signals; and (vi) signal processing means for providing those signal components of said input signals that have interaural level differences and interaural time differences in the interval between said left and right interaural level difference limits, and left and right interaural time difference limits, respectively, to a supplementary output terminal for provision to the corresponding physical supplementary sound reproducing transducer.

9. A device according to claim 8, wherein those signal components that have interaural level and time differences outside said limits are provided to said left and right primary sound reproducing transducers, respectively.

10. A device according to claim 8, wherein those signal components that have interaural differences outside said limits are provided as input.

11. A device according to claim 8, wherein said binaural impulse responses comprise head-related transfer functions.

12. A device according to claim 8 further comprising coherence determining means determining the coherence between said pair of input signals, and wherein said signal components of the input signals are weighted by the inter-channel coherence between the input signals before being provided to said one or more physical supplementary sound reproducing transducers via said supplementary output terminal.

13. A device according to claim 8, wherein the frontal direction relative to a listener, and hence the respective processing by said pre-processing means is chosen by the listener.

14. A device according to claim 8, wherein the frontal direction relative to a listener, and hence the respective processing by said pre-processing means is controlled by means of head-tracking means attached to a listener or other means for determining the orientation of the listener relative to the set-up of sound reproducing transducers.

15. A system for selecting auditory signal components for reproduction by means of one or more physical supplementary sound reproducing transducers, such as loudspeakers, placed between a pair of primary sound reproducing transducers, such as left and right loudspeakers in a stereophonic loudspeaker setup or adjacent loudspeakers in a surround sound loudspeaker setup, the system comprising at least two of the devices according to claim 9, wherein a first of said devices is provided with first left and right input signals, and wherein the first device provides output signals on a left output terminal, a right output terminal and a supplementary output terminal, the output signal on the supplementary output terminal being provided to a physical supplementary sound reproducing transducer, and the output signals on the left and right output signals, respectively, are provided to respective input signals of a subsequent device according to claim 8, whereby output signals are provided to respective of a number of physical supplementary sound reproducing transducers.

16. The system of claim 15, wherein the physical supplementary sound reproducing transducers are physical loudspeakers, and wherein the pair of primary sound reproducing transducers are left and right loudspeakers in a stereophonic loudspeaker setup or adjacent loudspeakers in a surround sound loudspeaker setup.

17. The method of claim 1, wherein the physical supplementary sound reproducing transducers are physical loudspeakers, and wherein the pair of primary sound reproducing transducers are left and right loudspeakers in a stereophonic loudspeaker setup, and wherein the step of reproducing said signal components in said physical supplementary sound reproducing transducers comprises reproducing said signal components in said physical loudspeakers.

18. The method of claim 1, wherein the physical supplementary sound reproducing transducers are physical loudspeakers, and wherein the pair of primary sound reproducing transducers are adjacent loudspeakers in a surround sound loudspeaker setup, and wherein the step of reproducing said signal components in said physical supplementary sound reproducing transducers comprises reproducing said signal components in said physical loudspeakers.

19. The device of claim 8, wherein the physical supplementary sound reproducing transducers are physical loudspeakers, and wherein the pair of primary sound reproducing transducers are left and right loudspeakers in a stereophonic loudspeaker setup.

20. The device of claim 8, wherein the physical supplementary sound reproducing transducers are physical loudspeakers, and wherein the pair of primary sound reproducing transducers are adjacent loudspeakers in a surround sound loudspeaker setup.

21. The method according to claim 1, wherein a listening direction is specified for auditory rotation of the loudspeaker setup, and wherein said left and right interaural level difference limits and left and right interaural time difference limits are determined also based on said listening direction.

22. The device according to claim 8, wherein said specification means are also for specifying a listening direction, and wherein said determining means determine left and right interaural level difference limits and left and right interaural time difference limits also based on said listening direction.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The invention will be better understood by reading the following detailed description of an embodiment of the invention in conjunction with the figures of the drawing, where:

(2) FIG. 1 illustrates an ideal arrangement of loudspeakers and listeners for reproduction of stereo signals;

(3) FIG. 2 shows (a) Interaural Level Difference (ILD), and (b) Interaural Time Difference as functions of frequency for ideal stereo reproduction;

(4) FIG. 3 illustrates the case of off-axis listening position with respect to a stereo loudspeaker pair;

(5) FIG. 4 shows (a) Interaural Level Difference (ILD), and (b) Interaural Time Difference as functions of frequency for off-axis listening;

(6) FIG. 5 shows listening area coordinate system and listener's head orientation;

(7) FIG. 6 illustrates an automotive listening scenario;

(8) FIG. 7 shows (a) Position 1 ILD as a function of frequency, (b) Position 1 ITD as a function of frequency, (c) Position 2 ILD as a function of frequency, and (d) Position 2 ITD as a function of frequency;

(9) FIG. 8 shows for in-car listening (a) Position 3 ILD as a function of frequency, (b) Position 3 ITD as a function of frequency, (c) Position 4 ILD as a function of frequency, and (d) Position 4 ITD as a function of frequency;

(10) FIG. 9 shows a block diagram of a stereo to multi-mono converter according to an embodiment of the invention, comprising three output channels for a left loudspeaker, a centre loudspeaker and a right loudspeaker, respectively;

(11) FIG. 10 shows an example of the location of centre loudspeaker and angle limits;

(12) FIG. 11 shows the location of the centre loudspeaker and angle limits after listening direction has been rotated;

(13) FIG. 12 shows (a) Magnitude of H.sub.IAmusic(f), (b) Phase delay of H.sub.IAmusic(f);

(14) FIG. 13 shows (a) IDLleftlimit, (b) ILDrightlimit, (c) ITDleftlimit, and (d) ITDrightlimit;

(15) FIG. 14 shows the coherence between left and right channels for a block of 512 samples of Bird on a Wire;

(16) FIG. 15 shows ILD thresholds for sources at 10 and +10 and the magnitude of H.sub.IAmusic(f);

(17) FIG. 16 shows mapping of ILD.sub.music to a filter;

(18) FIG. 17 shows mapping of ILD.sub.music to a filter;

(19) FIG. 18 shows ITD thresholds for sources at 10 and +10 and the phase delay of H.sub.IAmusic(f);

(20) FIG. 19 shows mapping of ITD.sub.music to a filter;

(21) FIG. 20 shows mapping of ITD.sub.music to a filter;

(22) FIG. 21 shows the magnitude of H.sub.center(f);

(23) FIG. 22 shows a portion of a 50 Hz sine wave with discontinuities due to time-varying filtering;

(24) FIG. 23 shows the octave smoothed magnitude of H.sub.center(f);

(25) FIG. 24 shows the magnitude of H.sub.center(f) for two adjacent analysis blocks;

(26) FIG. 25 shows the magnitude of H.sub.center(f) for two adjacent analysis blocks after slew rate limiting;

(27) FIG. 26 shows a portion of a 50 Hz sine wave with reduced discontinuities due to slew rate limiting;

(28) FIG. 27 shows the impulse response of H.sub.center(k);

(29) FIG. 28 shows (a) the output of linear convolution, and (b) output of circular convolution;

(30) FIG. 29 shows (a) the output of linear convolution, and (b) output of circular convolution with zero padding;

(31) FIG. 30 shows the location of the centre loudspeaker and angle limits where the listening direction is outside the angular range between the pair of primary loudspeakers.

DETAILED DESCRIPTION OF THE INVENTION

(32) In the following, a specific embodiment of a device according to the invention, also termed a stereo to multi-mono converter, is described. In connection with the detailed description of this embodiment, specific numerical values for instance relating to respective angles in the loudspeaker set-up are used both in the text, figures and occasionally in various mathematical expressions, but it is understood that such specific values are only to be understood as constituting an example and that other parameter values will also be covered by the invention. The basic functional principle of this converter will be described with reference to the schematic block diagram shown in FIG. 9. While the embodiment shown in FIG. 9 is scalable to n loudspeakers, and can be applied to auditory scenes encoded with more than two channels, the embodiment described in the following provides extraction of a signal for one supplementary loudspeaker in addition to the left and right loudspeakers (the primary loudspeakers) of the normal stereophonic reproduction system. As shown in FIG. 11, the one supplementary loudspeaker 56 is in the following detailed description generally placed rotated relative to the 0 azimuth direction and in the median plane of the listener. The scenario shown in FIG. 10 constitutes one specific example, wherein v.sub.listen is equal to zero degrees azimuth.

(33) Referring again to FIG. 9, the stereo to multi-mono converter (and the corresponding method) according to this embodiment of the invention comprises five main functions, labelled A to E in the block diagram.

(34) In function block A, a calculation and analysis of binaural signals is performed in order to determine if a specific signal component in the incoming stereophonic signal L.sub.source[n] and R.sub.source[n] (reference numerals 14 and 15, respectively) is attributable to a given azimuth interval comprising the supplementary loudspeakers 56 used to reproduce the audio signal. Such an interval is illustrated in FIGS. 10 and 11 corresponding to the centre loudspeaker 56.

(35) The input signal 14, 15 is in this embodiment converted to a corresponding binaural signal in the HRTF stereo source block 24 and based on this binaural signal, interaural level difference (ILD) and interaural time difference (ITD) for each signal component in the stereophonic input signal 14, 15 are determined in the blocks termed ILD music 29 and ITD music 30. In boxes 25 and 26, the left and right angle limits, respectively, are set (for instance as shown in FIGS. 10 and 11) based on corresponding input signals at terminals 54 (Left range), 53 (Listening direction) and 55 (Right range), respectively. The corresponding values of the HRTF's are determined in 27 and 28. These HRTF limits are converted to corresponding limits for interaural level difference and interaural time difference in blocks 31, 32, 33 and 34. The output from functional block A (reference numeral 19) is the ILD and ITD 29, 30 for each signal component of the stereophonic signal 14, 15 and the right and left ILD and ITD limits 31, 32, 33, 34. These output signals from functional block A are provided to the mapping function in functional block C (reference numeral 21), as described in the following.

(36) The input stereophonic signal 14, 15 is furthermore provided to a functional block B (reference numeral 20) that calculates the inter-channel coherence between the left 14 and right 15 signals of the input stereophonic signal 14, 15. The resulting coherence is provided to the mapping function in block C.

(37) The function block C (21) maps the interaural differences and coherence calculated in the function A (19) and B (20) into a filter D (22), which interaural differences and inter-channel coherence will be used to extract those components of the input signals l.sub.source[n] and r.sub.source[n] (14, 15) that will be reproduced by the centre loudspeaker. Thus, the basic concept of the extraction is that stereophonic signal components which with a high degree of probability will result in a phantom source being perceived at or in the vicinity of the position, at which the supplementary loudspeaker 56 is located, will be routed to the supplementary loudspeaker 56. What is meant by vicinity is in fact determined by the angle limits defined in block A (19), and the likelihood of formation of a phantom source is determined by the left and right inter-channel coherence determined in block 20.

(38) The basic functions of the embodiment of the invention shown in FIG. 9 are described in more detail below. The specific calculations and plots relate to an example wherein a signal is extracted for one additional loudspeaker placed at zero degrees azimuth between a left and right loudspeaker placed at +/30 degrees azimuth, respectively, this set-up corresponding to a traditional stereophonic loudspeaker set-up as shown schematically in FIG. 10. The corresponding values of the Left range, Listening position, and Right range input signals 54, 53, 55 are here chosen to be 10 degrees, 0 degrees, +10 degrees azimuth, corresponding to the situation shown in FIG. 10.

(39) Function A: Calculation and Analysis of the Binaural Signals

(40) The first step consists of calculating ear input signals l.sub.ear[n] and r.sub.ear[n] by convolving the input stereophonic signals l.sub.source[n] and r.sub.source[n] from the stereo signal source with free-field binaural impulse responses for sources at 30 (h.sub.30L[n] and h.sub.30R[n]) and at +30 (h.sub.+30r[n] and h.sub.+30L[n]). Time-domain convolution is typically formulated as a sum of the product of each sample of the first sequence with a time reversed version of the other second sequence shown in the following expression:

(41) $l_{ear} [n] = {.Math.}_{k = -}^{} l_{source} [n] h_{- 30 degL} [n - k] + {.Math.}_{k = -}^{} r_{source} [n] h_{+ 30 degL} [n - k]$ $r_{ear} [n] = {.Math.}_{k = -}^{} r_{source} [n] h_{+ 30 degR} [n - k] + {.Math.}_{k = -}^{} l_{source} [n] h_{- 30 degR} [n - k]$

(42) These signals correspond to the ear input signals in the case of ideal stereophony as described above.

(43) The centre loudspeaker is intended to reproduce a portion of the auditory scene that is located between the Left angle limit, v.sub.Llimit, and the Right angle limit, v.sub.Rlimit that are calculated from the angle variables Left range, Right range and Listening direction (also referred to as v.sub.Lrange, v.sub.Rrange and v.sub.Listen) as in the following equations:
custom character .sub.Llimit=.sub.Lrange.sub.Listen
.sub.Rlimit=.sub.Rrange.sub.Listen

(44) In the present specific example, v.sub.Lrange, v.sub.Rrange are /+10 degrees, respectively, and v.sub.Listen is 0 degrees.

(45) If the playback system contains multiple loudspeakers, then the angle variables Left range, Right range and Listening direction allow the orientation and width of the rendered auditory scene to be manipulated. FIG. 11 shows an example where Listening direction is not zero degrees azimuth with the result being a rotation of the auditory scene to the left when compared to the scenario in FIG. 10. Changes to these variables could be made explicitly by a listener or could be the result of a listener position tracking vector (for instance a head-tracker worn by a listener).

(46) Furthermore, in FIG. 30 there is shown a more general situation, in which the listening direction is outside the angular range comprising the supplementary loudspeaker 56. Although not described in detail, this situation is also covered by the present invention.

(47) The ILD and ITD limits in each case are calculated from the free-field binaural impulse responses for a source at v.sub.Llimit degrees, h.sub.vLlimitdegL[n] and h.sub.vLlimitdegR[n], and a source at v.sub.Rlimit degrees, h.sub.vRlimitdegL[n] and h.sub.RlimitdegR[n].

(48) In the present embodiment, the remainder of the signal analysis in functions A through D operates on frequency domain representations of blocks of N samples of the signals described above. A rectangular window is used. In the examples described below N=512.

(49) The frequency domain representations of a block of the ear input signals, music signals and the binaural impulse responses (for a source in the free-field at 0this processing is for the centre loudspeaker) are calculated using the DFT as formulated in the equations below:

(50) $L_{ear} [k] = {.Math.}_{n = 0}^{N - 1} l_{ear} (n)^{j (2 / N) kn}$ $R_{ear} [k] = {.Math.}_{n = 0}^{N - 1} r_{ear} (n)^{j (2 / N) kn}$ $L_{source} [k] = {.Math.}_{n = 0}^{N - 1} l_{source} (n)^{j (2 / N) kn}$ $R_{source} [k] = {.Math.}_{n = 0}^{N - 1} r_{source} (n)^{j (2 / N) kn}$ $H_{_{Llimit} degL} [k] = {.Math.}_{n = 0}^{N - 1} h_{_{Llimit} degL} [n]^{j (2 / N) kn}$ $H_{_{Llimit} degR} [k] = {.Math.}_{n = 0}^{N - 1} h_{_{Llimit} degR} [n]^{j (2 / N) kn}$ $H_{_{Rlimit} degL} [k] = {.Math.}_{n = 0}^{N - 1} h_{_{Rlimit} degL} [n]^{j (2 / N) kn}$ $H_{_{Rlimit} degR} [k] = {.Math.}_{n = 0}^{N - 1} h_{_{Rlimit} degR} [n]^{j (2 / N) kn}$

(51) Next, three interaural transfer functions are calculated as shown below:

(52) $H_{IAleftlimit} [k] = \frac{H_{_{Llimit} degL} [k]}{H_{_{Llimit} degR} [k]}$ $H_{IArightlimit} [k] = \frac{H_{_{Rlimit} degL} [k]}{H_{_{Rlimit} degR} [k]}$ $H_{IAmusic} [k] = \frac{L_{ear} [k]}{R_{ear} [k]}$

(53) As mentioned above, ILD.sub.leftlimit, ILD.sub.rightlimit and ILD.sub.music are calculated from the magnitude of the appropriate transfer function. Similarly, ITD.sub.leftlimit, ITD.sub.rightlimit and ITD.sub.music are calculated from the phase of the appropriate transfer function.

(54) The centre frequencies, f, of each FFT bin, k, are calculated from the FFT size and sample rate. The music signal used for the examples below is samples n=2049:2560 of Bird on a Wire after the music begins. With reference to FIG. 12 there is shown ILD.sub.music and ITD.sub.music.

(55) With reference to FIG. 13 (left plot) there is shown ILD.sub.leftlimit and ILD.sub.rightlimit.

(56) These ILD and ITD functions are part of the input to the mapping step in Function Block C (reference numeral 21) in FIG. 9.

(57) Function B: Calculation of the Coherence Between the Signals

(58) The coherence between l.sub.source[n] and r.sub.source[n], which as mentioned above takes a value between 0 and 1, is calculated from the power spectral densities of the two signals and their cross-power spectral density.

(59) The power spectral densities of l.sub.source[n] and r.sub.source[n] can be calculated in the frequency domain as the product of the spectrum with its complex conjugate as shown below:
P.sub.LL[k]=L.sub.source[k].Math.L.sub.source[k]*
P.sub.RR[k]=R.sub.source[k].Math.R.sub.source[k]*

(60) The cross-power spectral density of l.sub.source[n] and r.sub.source[n] can be calculated in the frequency domain as a product of L.sub.source[k] and the complex conjugate of R.sub.source[k], as shown below:
P.sub.LR[k]=L.sub.source[k].Math.R.sub.source[k]*

(61) The coherence can be calculated in the frequency domain by means of the following equation:

(62) $C_{LR} [f] = \frac{.Math. P_{LR}^{2} .Math.}{P_{LL} .Math. P_{RR}}$
C.sub.LR was calculated over 8 blocks in the examples shown here.

(63) C.sub.LR will be equal to 1 at all frequencies if l.sub.source[n]=r.sub.source[n]. If l.sub.source[n] and r.sub.source[n] are two independent random signals, then C.sub.LR will be close to 0 at all frequencies. The coherence between l.sub.source[n] and r.sub.source[n] for the block of music is shown in FIG. 14.

(64) Function C: Mapping Interaural Differences and Coherence to a Filter

(65) This function block maps the interaural differences and coherence calculated in the functions A and B into a filter that will be used to extract the components of l.sub.source[n] and r.sub.source[n] that will be reproduced by the centre loudspeaker. The basic idea is that the contributions of the ILD, ITD and interchannel coherence functions to the overall filter are determined with respect to some threshold that is determined according to the angular range intended to be covered by the loudspeaker. In the following, the centre loudspeaker is assigned the angular range of 10 to +10 degrees.

(66) Mapping ILD to the Filter Magnitude

(67) The ILD thresholds are determined from the free field interaural transfer function for sources at 10 and +10 degrees. Two different ways of calculating the contribution of ILD to the final filter are briefly described below.

(68) In the first mapping approach, any frequency bins with a magnitude outside of the limits, as can be seen in FIG. 15, are attenuated. Ideally the attenuation should be infinite. In practice, the attenuation is limited to A dB, in the present example 30 dB, to avoid artefacts from the filtering such as clicking. These artefacts will be commented further upon below. This type of mapping of ILD to the filter is shown in FIG. 16.

(69) An alternative method is simply to use the negative absolute value of the magnitude difference between H.sub.IAff[f] for a source at 0 degrees and H.sub.IAmusic[f] as the filter magnitude as shown in FIG. 17. In this way, the larger difference between H.sub.IAmusic[f] and H.sub.IAff[f], the more H.sub.IAmusic[f] is attenuated. There are no hard thresholds as in the method above and therefore some components will bleed into adjacent loudspeakers.

(70) Mapping ITD to the Filter Magnitude

(71) As in the previous section, the ITD thresholds are determined from the free field interaural transfer function for sources at 10 and +10 degrees, respectively. Again, two methods for including the contribution of ITD to the final filter are described below.

(72) The phase difference between H.sub.IAff[f] for a source at 0 degrees and H.sub.IAmusic[f] is plotted with the ITD thresholds for the centre loudspeaker in FIG. 18.

(73) The result of the first hard threshold mapping approach is the filter magnitude shown in FIG. 19. All frequency bins where the ITD is outside of the threshold set by free field sources at 10 and +10 degrees, respectively, are in this example attenuated by 30 dB.

(74) Another approach is to calculate the attenuation at each frequency bin based on its percentage delay compared to free filed sources at 30 and +30 degrees, respectively. For example, if the maximum delay at some frequency was 16 samples and the ITD for the block of music was 4 samples, its percentage of the total delay would be 25%. The attenuation then could be 25% of the total. That is, if the total attenuation allowed was 30 dB, then the relevant frequency bin would be attenuated by 18 dB.

(75) An example of the filter magnitude designed in this way is shown in FIG. 20.

(76) Mapping Coherence to the Filter Magnitude

(77) As intensity and time panning function best for coherent signals, the operation of the stereo to multi-mono conversion should preferably take the coherence between l.sub.source[n] and r.sub.source[n] into account. When these signals are completely incoherent, no signal should be sent to the centre channel. If the signals are completely coherent and there is no ILD and ITD, then ideally the entire contents of l.sub.source[n] and r.sub.source[n] should be sent to the centre loudspeaker and nothing should be sent to the left and right loudspeakers.

(78) The coherence is used in this implementation as a scaling factor and is described in the next section.

(79) Function D: Filter Design

(80) The basic filter for the centre loudspeaker, H.sub.centre[f], is calculated as a product of the ILD filter, ITD filter and coherence formulated in the equation below. It is important to note that this is a linear phase filterthe imaginary part of each frequency bin is set to 0 as it is not desired to introduce phase shifts into the music.
H.sub.center[f]=ILDMAP.sub.centre[f].Math.ITDMAP.sub.centre[f].Math.C.sub.LR[f]

(81) The result is a filter with a magnitude like that shown in FIG. 21.

(82) H.sub.centre[f] is updated for every block, i.e. it is a time varying filter. This type of filter introduces distortion which can be audible if the discontinuities between blocks are too large. FIG. 22 shows an example of such a case where discontinuities can be observed in a portion of a 50 Hz sine wave around samples 400 and 900.

(83) Two means to reduce the distortion are applied in the present implementation.

(84) First across-frequency smoothing is applied to H.sub.centre[f]. This reduces the sharp changes in filter magnitude of adjacent frequency bins. This smoothing is implemented by replacing the magnitude of each frequency bin with the mean of the magnitudes of an octave to either side of it resulting in the filter shown in FIG. 23. Note that the scale of the y-axis is changed compared with FIG. 21.

(85) Slew rate limiting is also applied to the magnitude of each frequency bin from one block to the next. FIG. 24 shows H.sub.centre[f] for the present block and the previous block. Magnitude differences of approximately 15 dB can be seen around 1 kHz and 10 kHz.

(86) The magnitude of these differences will cause audible distortion that sounds like clicking. The slew rate limiting is implemented with a conditional logic statement, an example of which is given in the pseudo-code below.

(87) Algorithm 1 (Pseudo-Code for Limiting the Slew Rate of the Filter):

(88) TABLE-US-00001 if new value > (old value + maximum positive change) then new value = (old value + maximum positive change) else if new value < (old value maximum negative change) then new value = (old value maximum negative change) end if end if

(89) Choosing the values of maximum positive and negative change is a trade-off between distortion and having a filter that reacts quickly enough to represent the most important time-varying nature of the relationship between l.sub.source[n] and r.sub.source[n]. The values were in this example determined empirically and 1.2 dB was found to be acceptable. FIG. 25 shows the change between H.sub.centre[f] for the present block and the previous block using this 1.2 dB slew rate limit.

(90) Consider again the regions around 1 kHz and 10 kHz. It is clear that only the differences up to the slew rate limit have been preserved. FIG. 26 shows the same portion of a 50 Hz sine wave where across-frequency-smoothing and slew rate limiting has been applied to the time varying filter. The discontinuities that were clearly visible in FIG. 22 are greatly reduced. The fact that the gain of the filters has also changed at this frequency is also clear from the fact that the level of the sine wave has changed. As mentioned above there is a trade-off between accuracy representing the inter-channel relationships in the source material and avoiding artefacts from the time-varying filter.

(91) If fast-convolution is to be used, which is equivalent to circular convolution, the filters must be converted to their time-domain forms so that time-aliasing can be properly controlled (this will be more thoroughly described below).

(92) The inverse discrete Fourier transform, abbreviated IDFT and given by the following equation and referred to as the Fourier synthesis equation of H.sub.centre[k] yields its impulse response.

(93) $h_{center} [n] = \frac{1}{N} {.Math.}_{k = 0}^{N - 1} H_{center} [k]^{- j (2 / N) kn}$

(94) As H.sub.center[f] is linear phase, H.sub.center[n] is an acausal finite impulse response (FIR) filter, N samples long, which means that it precedes the first sample. This type of filter can be made causal by applying a delay of N/2 samples as shown in FIG. 27. Note that the filter is symmetrical about sample N/2+1. The tap values have been normalised for plotting purposes only.

(95) Function E: Calculate Signals for Each Loudspeaker

(96) Fast Convolution Using the Overlap-Save Method

(97) The time to convolve two sequences in the time domain is proportional to N.sup.2 where N is the length of the longest sequence. Whereas the time to convolve two sequences in the frequency domain, that is the product of their frequency responses, is proportional to NlogN. This means that for sequences longer than approximately 64 samples, frequency domain convolution is computationally more efficient and hence the phrase fast convolution. There is an important difference in the output of the two methodsfrequency domain convolution is circular. The curve shown in heavy line in FIG. 28 is the output sequence of the time domain convolution of the filter in FIG. 27, length N=512, with a 500 Hz sine wave, length M=512. Note the 256 sample pre-ringing that is a consequence of making causal the linear phase filter. In this case the output sequence is (N+M)1=1023 samples long. The light curve shown in FIG. 28 is the output sequence of fast convolution of the same filter and sine wave and is only 512 samples long. The samples that should come after sample 512 have been circularly shifted and added to samples 1 to 511, which phenomenon is known as time-aliasing.

(98) Time-aliasing can be avoided by zero padding the sequence before the Fourier transform and that is the reason of returning to a time domain representation of the filters mentioned in the section about Function Block D above. The heavy curve in FIG. 29 is the output sequence of the time domain convolution of the filter in FIG. 27, length N=512, with a 500 Hz sine wave, length M=1024. In this case the output sequence is (N+M)1=1535 samples long. The light curve in FIG. 29 is the output sequence of fast convolution of the same filter zero padded to a length N=1024 samples and sive wave still with length M=1024. Here the output sequence is 1024 samples long, however, in contrast to the case above, the portion of the output sequence in the same position as the zero padding, samples 512 to 1024, is identical to the output of the time domain convolution.

(99) Saving this portion and repeating the process by shifting 512 samples ahead along the sine wave is called the overlap-save method of fast convolution and is equivalent to time domain convolution with the exception of the additional 256 sample delay making the total delay associated with the filtering process filter_delay=512 samples. Reference is made to Oppenheim and Schafer [1999, p. 587] for a thorough explanation of this technique.

(100) Calculation of Output Signals

(101) The signal to be reproduced by the Centre loudspeaker, c.sub.output[n], is calculated using the following equations:

(102) 0 $l_{filtered} [n] = (\frac{1}{N} {.Math.}_{k = 0}^{N - 1} H_{center} [k] .Math. L_{source} [k]^{- j (2 / N) kn})$ $r_{filtered} [n] = (\frac{1}{N} {.Math.}_{k = 0}^{N - 1} H_{center} [k] .Math. R_{source} [k]^{- j (2 / N) kn})$ $c_{output} [n] = l_{filtered} [n] + r_{filtered} [n]$

(103) The signals to be reproduced by the Left and Right loudspeakers, respectively, are then calculated by subtracting c.sub.output[h] from l.sub.source[n] and r.sub.source[n], respectively, as shown in the equation below. Note that l.sub.source[n] and r.sub.source[n] are delayed to account for the filter delay filter_delay.
l.sub.output[n]=Z.sup.filter.sup._.sup.delay.Math.l.sub.source[n]l.sub.filtered[n]
r.sub.output[n]=Z.sup.filter.sup._.sup.delay.Math.r.sub.source[n]r.sub.filtered[n]

(104) In the special case where r.sub.source[n]=l.sub.source[n], the signals are negatively correlated, and it is easy to show that all the output signals will be zero. In this case the absolute value of the phase of the cross-power spectral density, P.sub.LR[k], will be equal to k and the coherence, C.sub.LR[k], will be equal to 1k. The conditional statement in the pseudo-code below is applied to ensure the l.sub.output[n]=l.sub.source[n], r.sub.output[n]=l.sub.source[n] and c.sub.output[h]=0.

(105) Algorithm 2 (Pseudo-Code for Handling Negatively Correlated Signals):

(106) TABLE-US-00002 $if C_{LR} [k] = 1 AND \frac{.Math. phase (P_{LR} [k] .Math.}{} = 1 then$ $C_{LR} [k] = 0$ $end if$

(107) Also in the case of silence on either l.sub.source[n] or r.sub.source[n], then C.sub.LR[k] should be zero. However, there can be numerical problems that prevent this from happening. In the present implementation, if the value of either P.sub.LL[k] or P.sub.RR[k] falls below 140 dB, then C.sub.LR[k] is set to zero.

REFERENCES

(108) [1] Albert S. Bregman. Auditory Scene Analysis. The MIT Press, Cambridge, Mass., 1994. [2] Sren Bech. Spatial aspects of reproduced sound in small rooms. J. Acoust. Soc. Am., 103: 434-445, 1998. [3] Jens Blauert. Spatial Hearing. MIT Press, Cambridge, Mass., 1994. [4] D. Hammershi and H. Mller. Sound transmission to and within the human ear canal. J. Acoust. Soc. Am., 100(1); 408-427, 1996. [5] CIPIC Interface Laboratory. The cipic hrtf database, 2004. [6] Allan V. Oppenheim and Ronald W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, Upper Saddle River, 1999. [7] H. Tokuno, O. Kirkeby, P. A. Nelson and H. Hamada. Inverse filter of sound reproduction systems using regularization. IEICE Trans. Fundamentals, E80-A(5): 809-829, May 1997. [8] S. Perkin, G. M. Mackay, and A. Cooper. How drivers sit in cars. Accid. Anal. And Prev., 27(6): 777-783, 1995.

Multichannel sound reproduction method and device

Assignee

Inventors

Cpc classification

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

H04S2400/09

ELECTRICITY

Classification Explorer

H04S2420/01

ELECTRICITY

Classification Explorer

H04S2400/11

ELECTRICITY

Classification Explorer

H04S5/00

ELECTRICITY

Classification Explorer

H04S2400/05

ELECTRICITY

Classification Explorer

H04S7/303

ELECTRICITY

Classification Explorer

H04S2420/05

ELECTRICITY

Classification Explorer

H04R2499/13

ELECTRICITY

International classification

Classification Explorer

H04S5/00

ELECTRICITY

Classification Explorer

H04S3/00

ELECTRICITY

Classification Explorer

H04S7/00

ELECTRICITY

Classification Explorer

G06F3/0346

PHYSICS

Classification Explorer

H04R5/00

ELECTRICITY

Abstract

Claims

Description