Generating sound zones using variable span filters

11516614 · 2022-11-29

Assignee

Inventors

Cpc classification

International classification

Abstract

The invention provides a method for generating output filters to a plurality of loudspeakers at respective positions for playback of a plurality of different input signals in respective spatially different sound zones by means of a processor system. The method comprising computing spatio-temporal correlation matrices in response to spatial information, e.g. measured transfer functions, and in response to desired sound pressures in the plurality of sound zones. Joint eigenvalue decomposition of the spatial correlation matrices are then computed, or at least an approximation thereof, to arrive at eigenvectors accordingly. Next, variable span filters a reformed from a linear combination of the eigenvectors in response to a desired trade-off between acoustic contrast and acoustic errors in the sound zones. Finally, output filter for each of the plurality of loudspeakers, for each of the plurality of input signals, in accordance with the variable span filters. The method is applicable also for optimization in one zone, e.g. for room equalization.

Claims

1. A method for generating output filters to a plurality of loudspeakers at respective positions for playback of a plurality of different input signals in respective spatially different sound zones by a processor system, the method comprising: receiving spatial information, indicative of acoustic sound transmission between the plurality of loudspeaker positions and the sound zones, receiving input indicative of signal characteristics of the input signals, computing spatio-temporal correlation matrices in response to the spatial information, in response to the signal characteristics of the input signals, and in response to desired sound pressures in the plurality of sound zones, computing a joint eigenvalue decomposition of the spatial correlation matrices, to arrive at eigenvectors accordingly, computing variable span filters formed from a linear combination of the eigenvectors in response to a desired trade-off between acoustic contrast and acoustic errors in the sound zones, allowing a user to change the acoustic contrast versus an acoustic error trade-off by entering a trade-off input, and generating one output filter for each of the plurality of loudspeakers, for each of the plurality of input signals, in accordance with the variable span filters.

2. The method according to claim 1, further comprising determining for each of the sound zones a measure of auditory perception in response to the input indicative of signal characteristics of the input signals, and generating the output filters accordingly.

3. The method according to claim 2, wherein the auditory perception for each of the sound zones is updated dynamically in response to real-time analysis of the input signals.

4. The method according to claim 2, wherein the auditory perception is applied as a weighting.

5. The method according to claim 1, wherein the generation of the output filter is performed dynamically in response to analysis of the input signals.

6. The method according to claim 1, wherein the input indicative of signal characteristics of the input signals is based on a general knowledge of typical input signals.

7. The method according to claim 1, wherein the method of generating the output filters is performed off-line.

8. The method according to claim 1, wherein the desired trade-off is taken into account by selecting a Lagrange multiplier value and selecting a number of eigenvectors accordingly in a control filter of a optimization problem.

9. The method according to claim 1, comprising receiving acoustic transfer functions for each of the combinations of loudspeaker positions and sound zones, wherein the sound zones are represented by at least one position.

10. The method according to claim 9, wherein each sound zone is represented by at least one spatial position.

11. The method according to claim 1, further comprising receiving a trade-off input indicative of a desired minimum acoustic contrast and a desired maximum acoustic error in at least one of the sound zones in order to indicate desired trade-off between acoustic contrast and acoustic error.

12. The method according to claim 11, wherein the trade-off input comprises a value indicative of a minimum sound pressure error in one sound zone and a maximum sound pressure level in another sound zone.

13. The method according to claim 1, wherein the eigenvectors are approximated by a Fourier transform.

14. The method according to claim 1, wherein at least part of the method is performed with data represented in a time domain.

15. The method according to claim 1, wherein at least part of the method is performed with data represented in a frequency domain.

16. The method according to claim 1, wherein the input indicative of signal characteristics of the input signals comprises information regarding spectral content of the input signals.

17. The method according to claim 1, further comprising performing a calibration procedure after generation of the output filters, and performing a modification procedure to modify at least one of the output filters accordingly.

18. A device for generating output filters to a plurality of loudspeakers at respective positions for playback of a plurality of different input signals in respective spatially different sound zones, comprising: a memory configured to store computer program instructions; and a processor configured to perform the computer program instructions to: receive spatial information, indicative of acoustic sound transmission between the plurality of loudspeaker positions and the respective sound zones, receive input indicative of signal characteristics of the input signals, compute spatio-temporal correlation matrices in response to the spatial information, in response to the signal characteristics of the input signals, and in response to desired sound pressures in the plurality of sound zones, compute a joint eigenvalue decomposition of the spatial correlation matrices, to arrive at eigenvectors accordingly, compute variable span filters formed from a linear combination of the eigenvectors in response to a desired trade-off between acoustic contrast and acoustic errors in the sound zones, allow a user to change the acoustic contrast versus an acoustic error trade-off by entering a trade-off input, and generate one output filter for each of the plurality of loudspeakers, for each of the plurality of input signals, in accordance with the variable span filters.

19. A system for generating output filters to a plurality of loudspeakers at respective positions for playback of a plurality of different input signals in respective spatially different sound zones, comprising: a device configured to: receive spatial information, indicative of acoustic sound transmission between the plurality of loudspeaker positions and the sound zones, receive input indicative of signal characteristics of the input signals, compute spatio-temporal correlation matrices in response to the spatial information, in response to the signal characteristics of the input signals, and in response to desired sound pressures in the plurality of sound zones, compute a joint eigenvalue decomposition of the spatial correlation matrices, to arrive at eigenvectors accordingly, compute variable span filters formed from a linear combination of the eigenvectors in response to a desired trade-off between acoustic contrast and acoustic errors in the sound zones, allow a user to change the acoustic contrast versus an acoustic error trade-off by entering a trade-off input, and generate one output filter for each of the plurality of loudspeakers, for each of the plurality of input signals, in accordance with the variable span filters; and a plurality of loudspeakers configured to receive the signals and generating an acoustic output accordingly.

20. The method according to claim 1, further comprising: generating sound zones in a car cabin, in a living room, in a public room or in an indoor environment.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) The invention will now be described in more detail with regard to the accompanying figures of which

(2) FIG. 1 illustrates the basic sound zone concept,

(3) FIG. 2 illustrates in more details variables in a sound zone setup,

(4) FIG. 3 illustrates a block diagram of elements of a method embodiment,

(5) FIG. 4 illustrates steps of a method embodiment, and

(6) FIG. 5 illustrates a block diagram of a device embodiment.

(7) The figures illustrate specific ways of implementing the present invention and are not to be construed as being limiting to other possible embodiments falling within the scope of the attached claim set.

DETAILED DESCRIPTION OF THE INVENTION

(8) FIG. 1 illustrates the basic concept about generation of sound zones Z1, Z2 in one common acoustic environment, e.g. a room. Different sound input signals S1, S2 are processed in a processor P to generate output signals to a plurality of differently positioned loudspeakers generating acoustic outputs accordingly, here 4 are illustrated as an example. The purpose with the processor P is to process the sound input signals S1, S2 by output filters to each of the loudspeakers, one output filter per input signal per loudspeaker, trying to obtain the scenario that sound corresponding to S1 is primarily generated in zone Z1, while sound corresponding to S2 is primarily generated in zone Z2. Thus, zone Z1 is considered as bright zone for sound S1, while being dark zone for sound S1, and vice versa for zone Z2. The goal is to provide as high acoustic contrast between the zones Z1, Z2 as possible, and at the same time with as little sound distortion in the zones Z1, Z2 as possible. In practice, with a limited number of loudspeakers, a compromise or trade-off between acoustic contrast and sound distortion is required.

(9) The present invention provides a method of generating the output filters of the processor P, providing the possibility to take as input, e.g. from a user, a trade-off between acoustic contrast and distortion. Further, the method according to the invention is suited for incorporating auditor perceptual weightings taking advantage of masking effects, so as to obtain a perceptually improved acoustic contrast and distortion performance.

(10) Once the output filter are generated, the processor P can be seen as an audio device with an audio interface to receive the input signals and output the output signals to the loudspeakers accordingly. Especially, the device may have a user input control to allow the user to control trade-off between and adjust the output filters accordingly.

(11) It is to be understood that the output filters may be generated on a computer and downloaded into a separate audio device implementing the output filters, or a computer or other special device may be capable of receiving inputs to allow generation of the output filters e.g. in response to measured data or generalized or computed data downloaded from a database etc., such as depending on the specific setup of loudspeakers and room, definition of sound zones etc.

(12) Depending on the available processing power, the output filters can be real-time updated in response to the input signals, or the output filters can be computed off-line in response to statistics available for the input signals.

(13) FIG. 2 shows the scenario in more details for one input signal x(n) as a function of discrete time n, for simplicity, illustrating the bright zone M.sub.B. Each of the L loudspeakers are applied by the input signal x(n) via respective output filters q[n]. The various acoustic transfer functions h[n] between the loudspeaker outputs and pressure p[n] at receiver positions in the bright zone M.sub.B are illustrated. In general, the pressure p.sub.B in the bright zone can be expressed as:

(14) p B [ n ] = [ p 1 [ n ] .Math. p M B [ n ] ] T = [ h 1 T .Math. h M B T ] T 𝕏 [ n ] q = H B T [ n ] q

(15) Correspondingly, for the dark zone:
p.sub.D[n]=H.sub.D.sup.T[n]q,

(16) and for the total zone:

(17) p C [ n ] = [ p B [ n ] p D [ n ] ] = [ H B T [ n ] H D T [ n ] ] q = H C T [ n ] q , where H B [ n ] = 𝕏 T [ n ] [ h 1 .Math. h M B ] LJ × M B H D [ n ] = 𝕏 T [ n ] h 1 .Math. h M D ] LJ × M D q LJ × 1

(18) Here, L is the number of loudspeakers, J is the length of the time-domain variable span filter, and M is the number of positions in a zone (specified by subscript B=bright zone, D=dark zone).

(19) Thus, to compute the output filters q accordingly, an optimization problem must be formulated and solved. Once generated, e.g. in the form of Finite Impulse Response (FIR) filters, the output filters q can be used for playback of input signals via the loudspeakers to generate sound zones.

(20) FIG. 3 illustrates in a block diagram of elements of a method embodiment of the invention for generating output filters. Spatial information, preferably in the form of measured or computer impulse response or transfer functions h are obtained indicative of acoustic sound transmission between the plurality of loudspeaker positions and the sound zones, as illustrated in FIG. 2. Here each sound zone is represented by one or more spatial positions, e.g. each zone is represented by averaged transfer functions h for several spatial positions in the zone. Statistics of the input signals such as power spectral densities (PSD) or correlation matrices are computed in real-time over a period of time for the input signal and updated online, or generated as general knowledge data for typical expected input signals.

(21) To take into account auditory perceptual weighting, this can be implemented via a filtering of the sound reproduction error. Especially, reproduction error at the m'th receiver position can be described as:
ε.sub.m[n]=w.sub.m[n]*(d.sub.m[n]−p.sub.m[n]),

(22) where w.sub.m is the auditory perceptual weighting. Especially, w.sub.m can be selected to be the inverse of the auditory masking threshold, which masking threshold may in the most advanced form be determined from a real-time analysis of the input signals and thus updated dynamically.

(23) The sound reproduction error energy can be expressed as:

(24) S C = 1 N .Math. n = 0 N - 1 .Math. ε C [ n ] .Math. 2 = 1 N .Math. n = 0 N - 1 .Math. m = 1 M B + M D ε m 2 [ n ] = S B + S D ,

(25) where the signal distortion energy is:

(26) S B = 1 N .Math. n = 0 N - 1 .Math. m = 1 M B .Math. "\[LeftBracketingBar]" w m [ n ] * ( d m [ n ] - p m [ n ] ) .Math. "\[RightBracketingBar]" 2 ,

(27) and the residual energy is:

(28) S D = 1 N .Math. n = 0 N - 1 .Math. m = 1 M D .Math. "\[LeftBracketingBar]" - w m [ n ] * p m [ n ] .Math. "\[RightBracketingBar]" 2 .

(29) In case such auditory perceptual weighting w.sub.m, as just described, is applied, this will affect how the joint diagonalization in the following will be computed from the filtered/weighted quantities.

(30) Based on the input signal an auditory perception weighting is computed, e.g. based on a real-time input signals, such as the input signals being analysed with windows of length 10-1000 ms. Such auditory perception weighting spectral and/or temporal masking effects. Hereby, it is possible to take into account auditory perception effect that for a person in a zone, the desired sound in this zone can be seen as a masker for interfering sound, i.e. desired sound from other zones. Thus, taking this into account, most preferably by real-time analysis of the input signals and corresponding real-time update of the output filters, an improved perceived acoustic contrast can be obtained.

(31) Based on the above spatial information, auditory perception weighting, input signal statistics, and a desired specification of sound pressure (e.g. silence in the dark zone), spatio-temporal correlation matrices are computed in accordance to the explanation in relation to FIG. 2.

(32) Next, joint eigenvalue decomposition of the spatio-temporal correlation matrices, or at least an approximation thereof, is performed in order to arrive at eigenvectors accordingly. Still following the annotation from FIG. 2 and explanation thereto, a generalized eigenvalue problem fan be formulated as:
R.sub.Bq=λR.sub.Dq where R.sub.B,R.sub.D∈custom character.sup.LJ×LJ,λ=κ.sup.−2γ, where

(33) R B = 1 N .Math. n = 0 N - 1 H B [ n ] H B T [ n ] .

(34) From this, LJ eigenvectors U.sub.LJ and eigenvalues Λ.sub.LJ can be computed so that U.sub.LJ jointly diagonalizes R.sub.B, R.sub.D. In other words, R.sub.B and R.sub.D can be expressed by U.sub.LJ and Λ.sub.LJ. Such computations are known by the skilled person.

(35) The invention is based on the insight, that the optimization problem of computing output filters q for the loudspeaker in a sound zone system can be formulated and solved by setting up a control filter based on a variable span filter see e.g. “Signal enhancement with variable Span linear filters”, J. Benesty, Mads G. C., et al., 2016, ISBN 978-981-287-738-3. A desired trade-off between acoustic contrast and acoustic error or distortion can be used as input to computing variable span filters formed from a linear combination of the eigenvectors. The variable span filters are used then used solve the optimization problem, thereby resulting in one output filter for each of the plurality of loudspeakers, for each of the plurality of input signals. Especially, the variable span filters can be used to trade-off the sound reconstruction error in different zones, where the reconstructed sound is the desired sound minus an error. E.g. this can be used to minimize the pressure error in the bright zone, while the sound pressure level is below a chosen value in the dark zone.

(36) Using a Lagrange multiplier μ, a VAriable Span Trade-off control filter can be formulated as:

(37) q VAST = U V a V ( μ ) = U V ( Λ V + μ I V ) - 1 U V T r B = .Math. v = 1 V u v u v T μ + λ v r B

(38) Here, the correlation vector r.sub.B is:
r.sub.B=N.sup.−1Σ.sub.n=0.sup.N-1H.sub.B[n]d.sub.B[n].

(39) V is the number of eigenvectors and eigenvalues.

(40) Both of V and μ can be used to control the optimization trade-off, and thus provides an easy way of influencing the resulting performance of the output filters to desired characteristics, given the available number of loudspeakers L.

(41) FIG. 4 shows steps of a method embodiment for generating output filters to a plurality of loudspeakers at respective positions for playback of a plurality of different input signals in respective spatially different sound zones by means of a processor system. Step 1) is receiving R_SI spatial information indicative of acoustic sound transmission between the plurality of loudspeaker positions and the sound zones. This can be done including a step of measuring transfer functions between actual loudspeaker positions and one or more positions indicating each of the sound zones in a room. Step 2) is receiving R_SC input indicative of signal characteristics of the input signals. This can be done in the form of power spectral densities or correlation matrices for typical input signals, e.g. typical data for speech, music, or a mix thereof. Step 3) is computing C_CM spatio-temporal correlation matrices in response to the spatial information, in response to the signal characteristics of the input signals, and in response to desired sound pressures in the plurality of sound zones (e.g. silence in dark zone(s)). In case of measured transfer functions, these are used. In case of more generalized graphical data indicative of the physical positions of sound zones, the acoustic environment, and the loudspeaker positions therein, database transfer functions can be used, or simulated room impulse responses can be calculated using room acoustic simulation software.

(42) Next step is computing C_EV a joint eigenvalue decomposition of the spatial correlation matrices, as known by the skilled person to arrive at eigenvectors accordingly. Especially, various approximations to exact solutions can be used, if preferred.

(43) Next step is computing C_VSF variable span filters formed from a linear combination of the eigenvectors in response to a desired trade-off between acoustic contrast and acoustic errors in the sound zones. Especially, this can be done in response to a user input, where a user can input a desired acoustic contrast versus acoustic error trade-off to influence the resulting output filers.

(44) The final step is generating G_OF one output filter for each of the plurality of loudspeakers, for each of the plurality of input signals, in accordance with the variable span filters. These output filters can then be used for filtering audio input signals in order to generate audio output signals to be reproduced via loudspeaker in order to generate sound zones with different sound. Depending on the desired precision and depending on the acoustic environment of the sound zone setup, the resulting output filters can each be represented by FIR filters with the desired number of taps.

(45) FIG. 5 shows a block diagram of a device embodiment. An audio device with an audio input and output interface is capable of receiving a set of output filters, e.g. data representing FIR filter coefficients, which have been generated according to the method described in the forgoing. The audio device is then capable of generating a plurality of audio input signals, real-time filtering the audio input signals with the received output filters, and providing a set of audio output signals accordingly. The audio output signals are suited for being received and converted to acoustic signals by respective loudspeakers, either in a wired or wireless format. The output filters can be either generated by the user's own computer, or they can be generated at a server and provided for downloading to the audio device via the internet.

(46) In general, it is to be understood that the invention is applicable both in situations where one input signal is intended to be heard in one zone, but also in cases where e.g. two input signals, e.g. a set of stereo audio signals, are intended to be heard in one zone. Thus, in general the invention is applicable for multi-channel audio, e.g. surround sound system etc.

(47) In a special application, the method according to the invention can be used for equalizing a setup of one or more loudspeakers in a room. For this, only one sound zone is defined, and a number of positions are defined therein, where an optimization problem similar to the one described above in general, using variable span filter, can setup and solved to arrive at output filters to provide a given desired spectral sound characteristic within a defined zone.

(48) The invention has a plurality of applications where a high degree of acoustic contrast between different sound zones is desired, i.e. where different person want to be together in one common environment but listening to different sound input signals. E.g. in a living room, one watching/listening TV, while another one listens to sound from another audio source. This may be even more pronounced in a car cabin. In a museum, one language narrative speech can be played in one zone, while one or more other zones can dedicated to other language narrative speech at the same time. The invention can be used in outdoor setups, e.g. for generating acoustic contrast in simultaneous multi-concert environments.

(49) The invention in general solves the problem of providing a framework for generating output filters in a way that allows a user to setup a trade-off or compromise between acoustic contrast and acoustic error introduced, in a given setup of loudspeakers in a given environment.

(50) To sum up: the invention provides a method for generating output filters to a plurality of loudspeakers at respective positions for playback of a plurality of different input signals in respective spatially different sound zones by means of a processor system. The method comprising computing spatio-temporal correlation matrices in response to spatial information, e.g. measured transfer functions, and in response to desired sound pressures in the plurality of sound zones. Joint eigenvalue decomposition of the spatial correlation matrices are then computed, or at least an approximation thereof, to arrive at eigenvectors accordingly. Next, variable span filters are formed from a linear combination of the eigenvectors in response to a desired trade-off between acoustic contrast and acoustic errors in the sound zones. Finally, output filter for each of the plurality of loudspeakers, for each of the plurality of input signals, in accordance with the variable span filters. The method is applicable also for optimization in one zone, e.g. for room equalization.

(51) Although the present invention has been described in connection with the specified embodiments, it should not be construed as being in any way limited to the presented examples. The scope of the present invention is to be interpreted in the light of the accompanying claim set. In the context of the claims, the terms “including” or “includes” do not exclude other possible elements or steps. Also, the mentioning of references such as “a” or “an” etc. should not be construed as excluding a plurality. The use of reference signs in the claims with respect to elements indicated in the figures shall also not be construed as limiting the scope of the invention. Furthermore, individual features mentioned in different claims, may possibly be advantageously combined, and the mentioning of these features in different claims does not exclude that a combination of features is not possible and advantageous.