Apparatus and Method for Providing Individual Sound Zones

20190045316 ยท 2019-02-07

    Inventors

    Cpc classification

    International classification

    Abstract

    An apparatus for generating a plurality of loudspeaker signals from two or more audio source signals is provided. Each of the two or more audio source signals shall be reproduced in one or more of two or more sound zones, and at least one of the two or more audio source signals shall not be reproduced in at least one of the two more sound zones. An audio preprocessor is configured to modify each initial audio signal of the two or more initial audio signals depending on a signal power or a loudness of another initial audio signal of the two or more initial audio signals. A filter is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced.

    Claims

    1. An apparatus for generating a plurality of loudspeaker signals from two or more audio source signals, wherein each of the two or more audio source signals shah be reproduced in one or more of two or more sound zones, and wherein at least one of the two or more audio source signals shall not be reproduced in at least one of the two more sound zones, wherein the apparatus comprises: an audio preprocessor configured to modify each of two or more initial audio signals to acquire two or more preprocessed audio signals, and a filter configured to generate the plurality of loudspeaker signals depending on the two or more preprocessed audio signals, wherein the audio preprocessor is configured to use the two or more audio source signals as the two or more initial audio signals, or wherein the audio preprocessor is configured to generate for each audio source signal of the two or more audio source signals an initial audio signal of the two more initial audio signals by modifying said audio source signal, wherein the audio preprocessor is configured to modify each initial audio signal of the two or more initial audio signals depending on a signal power or a loudness of another initial audio signal of the two or more initial audio signals, and wherein the filter is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced.

    2. The apparatus according to claim 1, wherein the audio preprocessor is configured to modify each initial audio signal of the two or more initial audio signals depending on the signal power or the loudness of another initial audio signal of the two or more initial audio signals by modifying said initial audio signal of the two or more initial audio signals depending on a ratio of a first value to a second value, wherein the second value depends on the signal power of said initial audio signal, and the first value depends on the signal power of said another initial audio signal of the two or more initial audio signals, or wherein the second value depends on the loudness of said initial audio signal, and the first value depends on the loudness of said another initial audio signal of the two or more initial audio signals.

    3. The apparatus according to claim 1, wherein the audio preprocessor is configured to modify each initial audio signal of the two or more initial audio signals depending on the signal power or the loudness of another initial audio signal of the two or more initial audio signals by determining a gain for said initial audio signal and by applying the gain on said initial audio signal, wherein the audio preprocessor is configured to determine the gain depending on the ratio between the first value and the second value, said ratio being a ratio between the signal power of said another initial audio signal of the two or more initial audio signals and the signal power of said initial audio signal as the second value, or said ratio being a ratio between the loudness of said another initial audio signal of the two or more initial audio signals and the loudness of said initial audio signal as the second value.

    4. The apparatus according to claim 3, wherein the audio preprocessor is configured to determine the gain depending on a function that monotonically increases with the ratio between the first value and the second value.

    5. The apparatus according to claim 1, wherein the audio preprocessor is configured to modify an initial audio signal of the two or more initial audio signals by determining a gain g.sub.1(k) for said initial audio signal and by applying the gain g.sub.1(k) on said initial audio signal, wherein the audio preprocessor is configured to determine the gain g.sub.1(k) according to g 1 ? ( k ) = { 10 ( T 1 + v 1 ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 1 + v 1 < 0 1 otherwise , or according to g 1 ? ( k ) = { 10 ( T 2 + v 1 ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 2 + v 1 > 0 1 otherwise , .Math. with .Math. .Math. v 1 = - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( .Math. i = 2 N .Math. e i ? ( i ) ) wherein k is a time index, wherein T.sub.1 indicates a first threshold value and T.sub.2 indicates a second threshold value, wherein e.sub.1(k) indicates a signal power or a loudness of said initial audio signal, wherein N indicates a number of the two or more initial audio signals, wherein e.sub.i(k) indicates a signal power or a loudness of a further initial audio signal of the two or more initial audio signals, and wherein R indicates a number, with 1?R?100.

    6. The apparatus according to claim 1, wherein the audio preprocessor is configured to modify each initial audio signal of the two or more initial audio signals depending on the signal power or the loudness of another initial audio signal of the two or more initial audio signals by determining a gain g.sub.1(k) for said initial audio signal and by applying the gain g.sub.1(k) on said initial audio signal, wherein the audio preprocessor is configured to determine the gain g.sub.1(k) according to g 1 ? ( k ) = { 10 ( T 1 + v ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 1 + v < 0 1 otherwise , or according to g 1 ? ( k ) = { 10 ( T 2 + v ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 2 + v > 0 1 otherwise , .Math. with .Math. .Math. v = - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) ) wherein k is a time index, wherein T.sub.1 indicates a first threshold value and T.sub.2 indicates a second threshold value, wherein e.sub.1(k) indicates a signal power or a loudness of said initial audio signal, wherein e.sub.2(k) indicates a signal power or a loudness of said another initial audio signal of the two or more initial audio signals, and wherein R indicates a number, with 1?R?100.

    7. The apparatus according to claim 1, wherein the audio preprocessor is configured to modify each initial audio signal of the two or more initial audio signals according to
    e.sub.1(k)=?.sub.2e.sub.1(k?1)+(1??.sub.2)E.sub.i=1.sup.Ld.sub.1.sup.2(k,l),(22) or according to e 1 ? ( k ) = 1 K .Math. .Math. n = 0 K - 1 .Math. .Math. l = 1 L .Math. d 1 2 ? ( k - n , l ) , or according to e 1 ? ( k ) = max n = 0 , 1 , .Math. .Math. , K - 1 , .Math. l = 1 , .Math. 2 , .Math. .Math. .Math. , L .Math. d 1 2 ? ( k - n , l ) . wherein e.sub.1(k) indicates a signal power of said initial audio signal, wherein k indicates a time index, wherein ?.sub.2 is a value in the range 0<?.sub.2<1, wherein L is a number of audio channels of the initial audio signal, wherein L>1, wherein d.sub.1 indicates said initial audio signal, wherein K indicates a number of samples of a window.

    8. The apparatus according to claim 1, wherein the audio preprocessor is configured to generate the two more initial audio signals by normalizing a power of each of the two or more audio source signals.

    9. The apparatus according to claim 8, wherein the audio preprocessor is configured to generate each initial audio signal of the two more initial audio signals by normalizing a power of each audio source signal of the two or more audio source signals according to
    d.sub.1(k,l)=c.sub.1(k)u.sub.1(k,l), and according to c 1 ? ( k ) = 1 b 1 ? ( k ) , wherein k is a time index, wherein l indicates one of one or more audio channels of said audio source signal, wherein d.sub.1 indicates said initial audio signal, wherein u.sub.1 indicates said audio source signal, wherein b.sub.1 indicates an average of a power of said audio source signal u.sub.1.

    10. The apparatus according to claim 9, wherein the audio preprocessor is configured to determine the average b.sub.1 of the power of said audio source signal u.sub.1 according to
    b.sub.1(k)=?.sub.1b.sub.1(k?1)+(1??.sub.1)E.sub.i=1.sup.Lu.sub.1.sup.2(k,l), where 0<?<1.

    11. The apparatus according to claim 1, wherein the filter (140) is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced, by determining filter coefficients of an FIR filter.

    12. The apparatus according to claim 11, wherein the filter is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced by determining the filter coefficients of the FIR filter according to the formula
    g.sub.q=(H.sup.HW.sub.q.sup.HW.sub.qH).sup.?1H.sup.HW.sub.q.sup.HW.sub.qd.sub.q wherein g.sub.q is a vector comprising the filter coefficients of the FIR filter according to
    g.sub.q=(g.sub.q,1(0), . . . ,g.sub.q,1(L.sub.G?1),g.sub.q,2(0), . . . ,g.sub.q,2(L.sub.G?1),g.sub.q,N.sub.L(0), . . . ,g.sub.q,N.sub.L(L.sub.G?1)).sup.T wherein H is a convolution matrix depending on a room impulse response, wherein W is a weighting matrix, wherein d.sub.q indicates desired impulse responses, wherein g.sub.q,i indicates one of the filter coefficients with 1<r<N.sub.L, wherein N.sub.L indicates a number of loudspeakers, and wherein L.sub.G indicates a length of the FIR filter.

    13. The apparatus according to claim 1, wherein the filter is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced, by conducting Wave Field Synthesis.

    14. The apparatus according to claim 1, wherein the apparatus further comprises two or more band splitters being configured to conduct band splitting on the two or more preprocessed audio signals to a plurality of band-splitted audio signals, wherein the filter is configured to generate the plurality of loudspeaker signals depending on the plurality of band-splitted audio signals.

    15. The apparatus according to claim 14, wherein the apparatus further comprises one or more spectral shapers being configured to modify a spectral envelope of one or more of the plurality of band-splitted audio signals to acquire one or more spectrally shaped audio signals, wherein the filter is configured to generate the plurality of loudspeaker signals depending on the one or more spectrally shaped audio signals.

    16. A method for generating a plurality of loudspeaker signals from two or more audio source signals, wherein each of the two or more audio source signals shall be reproduced in one or more of two or more sound zones, and wherein at least one of the two or more audio source signals shall not be reproduced in at least one of the two more sound zones, wherein the method comprises: modifying each of two or more initial audio signals to acquire two or more preprocessed audio signals, and generating the plurality of loudspeaker signals depending on the two or more preprocessed audio signals, wherein the two or more audio source signals are used as the two or more initial audio signals, or wherein for each audio source signal of the two or more audio source signals an initial audio signal of the two more initial audio signals is generated by modifying said audio source signal, wherein each initial audio signal of the two or more initial audio signals is modified depending on a signal power or a loudness of another initial audio signal of the two or more initial audio signals, and wherein the plurality of loudspeaker signals is generated depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced.

    17. A non-transitory digital storage medium having a computer program stored thereon to perform the method for generating a plurality of loudspeaker signals from two or more audio source signals, wherein each of the two or more audio source signals shall be reproduced in one or more of two or more sound zones, and wherein at least one of the two or more audio source signals shall not be reproduced in at least one of the two more sound zones, wherein the method comprises: modifying each of two or more initial audio signals to acquire two or more preprocessed audio signals, and generating the plurality of loudspeaker signals depending on the two or more preprocessed audio signals, wherein the two or more audio source signals are used as the two or more initial audio signals, or wherein for each audio source signal of the two or more audio source signals an initial audio signal of the two more initial audio signals is generated by modifying said audio source signal, wherein each initial audio signal of the two or more initial audio signals is modified depending on a signal power or a loudness of another initial audio signal of the two or more initial audio signals, and wherein the plurality of loudspeaker signals is generated depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced, when said computer program is run by a computer.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0102] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:

    [0103] FIG. 1 illustrates an apparatus for generating a plurality of loudspeaker signals from two or more audio source signals according to an embodiment,

    [0104] FIG. 2 illustrates ideal multizone reproduction,

    [0105] FIG. 3 illustrates a reproduction of multiple signals in reality,

    [0106] FIG. 4 illustrates a minimal example of multizone reproduction with arrays,

    [0107] FIG. 5 illustrates in (a) an exemplary reproduction level in bright and dark zone, and illustrates in (b) a resulting acoustic contrast,

    [0108] FIG. 6 illustrates a general signal model of multizone reproduction with arrays,

    [0109] FIG. 7 illustrates multizone reproduction with arrays according to an embodiment,

    [0110] FIG. 8 illustrates a sample implementation of an audio preprocessor according to an embodiment,

    [0111] FIG. 9 illustrates an exemplary design of the band splitters according to embodiments, wherein (a) illustrates acoustic contrast achieved by different reproduction methods, and wherein (b) illustrates a chosen magnitude response of the audio crossover,

    [0112] FIG. 10 illustrates an exemplary design of the spectral shapers according to embodiments, wherein (a) illustrates acoustic contrast achieved by a specific reproduction method, and wherein (b) illustrates a chosen magnitude response of the spectral shaping filter, and

    [0113] FIG. 11 illustrates an exemplary loudspeaker setup in an enclosure according to an embodiment,

    DETAILED DESCRIPTION OF THE INVENTION

    [0114] FIG. 1 illustrates an apparatus for generating a plurality of loudspeaker signals from two or more audio source signals according to an embodiment. Each of the two or more audio source signals shad be reproduced in one or more of two or more sound zones, and at least one of the two or more audio source signals shah not be reproduced in at least one of the two more sound zones.

    [0115] The apparatus comprises an audio preprocessor 110 configured to modify each of two or more initial audio signals to obtain two or more preprocessed audio signals. Moreover, the apparatus comprises a filter 140 configured to generate the plurality of loudspeaker signals depending on the two or more preprocessed audio signals. The audio preprocessor 110 is configured to use the two or more audio source signals as the two or more initial audio signals, or wherein the audio preprocessor 110 is configured to generate for each audio source signal of the two or more audio source signals an initial audio signal of the two more initial audio signals by modifying said audio source signal. Moreover, the audio preprocessor 110 is configured to modify each initial audio signal of the two or more initial audio signals depending on a signal power or a loudness of another initial audio signal of the two or more initial audio signals.

    [0116] The filter 140 is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced.

    [0117] While the approaches of the state of the art can achieve a considerable acoustic contrast, the contrast achieved by conventional methods is typically not sufficient to provide multiple unrelated acoustic scenes to inhabitants of the same enclosure, whenever high-quality audio reproduction may be useful.

    [0118] The acoustic contrast perceived by the listeners shall be improved, which is dependent on the acoustic contrast as defined in Equation (14) above, but not identical to it. It shall be achieved that the acoustic contrast perceived by the listeners is increased rather than maximizing the contrast of acoustic energy. The perceived acoustic contrast will be referred to as subjective acoustic contrast, while the contrast in acoustic energy will be referred to as objective acoustic contrast in the following. Some embodiments employ measures to facilitate directional audio reproduction and measures to shape the acoustic leakage such that it becomes less noticeable.

    [0119] In addition to FIG. 1, the apparatus of FIG. 7 further comprises two (optional) band splitters 121, 122 and four (optional) spectral shapers 131, 132, 133, 134.

    [0120] According to some embodiments the apparatus may, e.g., further comprise two or more band splitters 121, 122 being configured to conduct band splitting on the two or more preprocessed audio signals to a plurality of band-splitted audio signals. The filter 140 may, e.g., be configured to generate the plurality of loudspeaker signals depending on the plurality of band-splitted audio signals.

    [0121] In some embodiments, the apparatus may, e.g., further comprises one or more spectral shapers 131, 132, 133, 134 being configured to modify a spectral envelope of one or more of the plurality of band-splitted audio signals to obtain one or more spectrally shaped audio signals. The filter 140 may, e.g., configured to generate the plurality of loudspeaker signals depending on the one or more spectrally shaped audio signals.

    [0122] In FIG. 7 a signal model of an implementation according to embodiments is shown. In particular, FIG. 7 illustrates multizone reproduction with arrays according to embodiments. This example has been chosen for conciseness, noting that the method is generally applicable to scenarios with N.sub.S signal sources, N.sub.L loudspeakers, and N.sub.Z listening zones, as described above.

    [0123] There are two signal sources shown in FIG. 7, which provide two independent signals that are fed to a Preprocessing stage. This preprocessing stage may, for example, in some embodiments implement a parallel processing for both signals (i.e., no mixing). Unlike the other processing steps, this processing step does not constitute a LTI system (Linear Time-Invariant System). Instead, this processing block determines time-varying gains for all processed source signals, such that their difference in reproduction level is reduced. The rationale behind this is that the acoustic leakage in each zone is linearly dependent on the scenes reproduced in the respective other zones. At the same time, the intentionally reproduced scenes can mask the acoustic leakage. Hence, the perceived acoustic leakage is proportional to the level difference between the scenes that are intentionally reproduced in the respective zones. As a consequence, reducing the level difference of the reproduced scenes will also reduce the perceived acoustic leakage and, hence, increase the subjective acoustic contrast. A more detailed explanation can be found when preprocessing is described below.

    [0124] The (optional) band splitters 121, 122 realize the (optional) processing step band splitting, and split the signal into multiple frequency bands, just like an audio crossover would do in a multi-way loudspeaker. However, unlike audio crossovers in a loudspeaker, it is only a second objective of this band splitter to maximize the radiated acoustic power. The primary objective of this band splitter is to distribute the individual frequency bands to individual reproduction measures such that the acoustic contrast is maximized, given certain quality constraints. For example, the signal w.sub.1(k) will later be fed to a single loudspeaker as signal x.sub.1(k). Given this loudspeaker is a directional loudspeaker, w.sub.1(k) would be high-pass filtered because the directivity of this loudspeaker will be low at low frequencies. On the other hand, w.sub.2(k) will later be filtered to obtain x.sub.2(k) and x.sub.3(k) such that the according loudspeakers are used as an electrically steered array. In a more complex scenario, there can be more outputs of the band splitter such that the signals are distributed to multiple reproduction methods according to the needs of the application (see also below, where a loudspeaker-enclosure-microphone system according to embodiments is described).

    [0125] As discussed above, the measures for directional reproduction applied later will exhibit a certain leakage from one zone to the other. This leakage can be measured as break down in acoustic contrast between the zones. In a complex setup, these breakdowns can occur at multiple points in the frequency spectrum for each of the envisaged directional reproduction methods, which constitute a major obstacle in the application of those methods. It is well-known that timbre-variations are acceptable to a certain extent. These degrees of freedom can be used to attenuate contrast-critical frequency bands.

    [0126] Thus, the (optional) spectral shapers 131, 132, 133, 134 are designed in a way such that the signals reproduced later are attenuated in these parts of the frequency spectrum, where a low acoustic contrast is expected. Unlike the band splitters, the spectral shapers are intended to modify the timbre of the reproduced sound. Moreover, this processing stage can also involve delays and gains such that the intentionally reproduced acoustic scene can spatially mask the acoustic leakage.

    [0127] The blocks denoted by G.sub.1(k) and G.sub.2(k) may, e.g., describe linear time-invariant filters that are optimized to maximize the objective acoustic contrast given subjective quality constraints. There are various possibilities to determine those filters, which include (but are no limited to) ACC, pressure matching (see [4] and [6]), and loudspeaker beamforming. It was found, that a least squared pressure matching approach as described below, when a prefilter according to embodiments is described, is especially suitable, when measured impulse responses are considered for the filter optimization. This can be an advantageous concept for implementation.

    [0128] Other embodiments employ the above approach by operating on calculated impulse responses. In particular embodiments, impulse responses are calculated to represent the free-field impulse responses from the loudspeakers to the microphones.

    [0129] Further embodiments, employ the above approach by operating on calculated impulse responses that have been obtained using image source model of the enclosure.

    [0130] It should be noted that the impulse responses are measured once such that no microphones may be used during operation. Unlike ACC, the pressure matching approach prescribes a given magnitude and phase in the respective bright zone. This results in a high reproduction quality. Traditional beamforming approaches are also suitable when high frequencies should be reproduced.

    [0131] The block denoted by H(k) represents the LEMS, where each input is associated with one loudspeaker. Each of the outputs is associated with an individual listener that receives the superposition of all loudspeaker contributions in his individual sound zone. The loudspeakers that are driven without using the prefilters G.sub.1(k) and G.sub.2(k) are either directional loudspeakers radiating primary into one sound zone or loudspeaker that are arranged near (or in) an individual sound zone such that they primarily excite sound in that zone. For higher frequencies, directional loudspeakers can be build without significant effort. Hence, these loudspeakers can be used to provide the high-range frequencies to the listeners, where the loudspeakers do not have to be placed directly at the listeners ears.

    [0132] In the following, embodiments of the present invention are described in more detail.

    [0133] At first, preprocessing according to embodiments are described. In particular, an implementation of the block denoted by Preprocessing in FIG. 7 is presented. For providing a better understanding, the following explanations concentrate on only one mono signal per zone. However, a generalization to multichannel signals is straightforward. Thus, some embodiments exhibit multichannel signals per zone.

    [0134] FIG. 8 illustrates a sample implementation of an audio preprocessor 110 and a corresponding signal model according to an embodiment. As described above, the two input signals u.sub.1(k) and u.sub.2(k) are intended to be primarily reproduced in Zone 1 and Zone 2, respectively. On the other hand, there is some acoustic leakage in the reproduction of u.sub.1(k) to Zone 2 and in the reproduction of u.sub.2(k) to Zone 1.

    [0135] The two input signals u.sub.1(k) and u.sub.2(k) are also referred to as audio source signals in the following.

    [0136] In a first, optional, stage, the power of both input signals, u.sub.1(k) and u.sub.2(k) (the audio source signals) is normalized to alleviate the parameter choice for the following processing.

    [0137] Thus, according to an optional embodiment, the audio preprocessor (110) may, e.g., be configured to generate the two more initial audio signals d.sub.1(k) and d.sub.2(k) by normalizing a power of each of the two or more audio source signals u.sub.1(k) and u.sub.Z(k).

    [0138] The obtained power estimates b.sub.1(k) and b.sub.2(k) typically describe a long-term average, in contrast to the estimators used in a later stage that are typically considering a smaller time span. The update of b.sub.1(k) and b.sub.2(k) can be connected with an activity detection for u.sub.1(k) and u.sub.2(k), respectively, such that the update of b.sub.1(k) or b.sub.2(k) is held, when there is no activity in u.sub.1(k) or u.sub.7(k). The signals c.sub.1(k) and c.sub.2(k) may, e.g., be inversely proportional to b.sub.1(k) and b.sub.2(k), respectively, such that a multiplication of c.sub.1(k) and c.sub.2(k) with u.sub.1(k) and u.sub.2(k), respectively, yields the signals, d.sub.1(k) and d.sub.2(k) that would exhibit comparable signal power. While using this first stage is not absolutely necessary, it ensures a reasonable working point for the relative processing of the signals d.sub.1(k) and d.sub.2(k), which alleviates finding suitable parameters for the following steps. It should be noted that if multiple instances of this processing block are placed after the Band splitter blocks or the Spectral shaper blocks, the power normalization has still to be applied before the Band splitter blocks.

    [0139] By a normalization of the signals, their relative level difference is already reduced. However this is typically not enough for the intended effect, because the power estimates are long-term, while the level variations of typical acoustic scenes are rather short-term processes. In the following, it is explained how the difference in relative power of the individual signals is explicitly reduced on a short-term basis, which constitutes the primary objective of the preprocessing block.

    [0140] The two signals d.sub.1(k) and d.sub.2(k) that are supposed to be scaled and reproduced, are also referred to as initial audio signals in the following.

    [0141] As described above, the audio preprocessor 110 may, e.g., configured to generate for each audio source signal of the two or more audio source signals u.sub.1(k), u.sub.2(k) an initial audio signal of the two more initial audio signals d.sub.1(k), d.sub.2(k) by modifying said audio source signal, e.g., by conducting power normalization.

    [0142] In alternative embodiments, however, the audio preprocessor 110 may, e.g., be configured to use the two or more audio source signals u.sub.1(k), u.sub.2(k) as the two or more initial audio signals d.sub.1(k), d.sub.2(k).

    [0143] In FIG. 7, the two signals d.sub.1(k) and d.sub.2(k) may, e.g., be fed to further loudness estimators, e.g., of the audio preprocessor 110, which provide the signals e.sub.1(k) and e.sub.2(k), respectively.

    [0144] These signals may, e.g., be used to determine the scaling factors g.sub.1(k) and k) according to


    g.sub.1=f(e.sub.1,e.sub.2),(17)


    g.sub.2=f(e.sub.2,e.sub.1),(18)

    [0145] where, in some embodiments, f(x,y) is a function that is monotonically increasing with respect to y and monotonically decreasing with respect to x, while its value may, for example, be limited to an absolute range.

    [0146] As a consequence, the value of f(x,y) may, e.g., also be monotonically increasing with the ratio y/x.

    [0147] The factors g.sub.1(k) and g.sub.2(k) are then used to scale the signals d.sub.1(k) and d.sub.2(k), respectively, to obtain the output signals h.sub.1(k) and h.sub.2(k). The output signals h.sub.1(k) and h.sub.2(k) may, e.g., be fed into one or more modules which are configured to conduct multizone reproduction, e.g., according to an arbitrary multizone reproduction method.

    [0148] Thus, in some embodiments, the audio preprocessor 110 may, e.g., be configured to modify each initial audio signal of the two or more initial audio signals depending on the signal power or the loudness of another initial audio signal of the two or more initial audio signals by modifying said initial audio signal of the two or more initial audio signals depending on a ratio of a first value (y) to a second value (x). The second value (x) may, e.g., depend on the signal power of said initial audio signal, and the first value (y) may, e.g., depend on the signal power of said another initial audio signal of the two or more initial audio signals. Or, the second value (x) may, e.g., depend on the loudness of said initial audio signal, and the first value (y) may, e.g., depend on the loudness of said another initial audio signal of the two or more initial audio signals.

    [0149] According to some embodiments, the audio preprocessor 110 may, e.g., be configured to modify each initial audio signal of the two or more initial audio signals depending on the signal power or the loudness of another initial audio signal of the two or more initial audio signals by determining a gain for said initial audio signal and by applying the gain on said initial audio signal. Moreover, the audio preprocessor 110 may, e.g., be configured to determine the gain depending on the ratio between the first value and the second value, said ratio being a ratio between the signal power of said another initial audio signal of the two or more initial audio signals and the signal power of said initial audio signal as the second value, or said ratio being a ratio between the loudness of said another initial audio signal of the two or more initial audio signals and the loudness of said initial audio signal as the second value.

    [0150] In some embodiments, the audio preprocessor 110 may, e.g., be configured to determine the gain depending on a function that monotonically increases with the ratio between the first value and the second value.

    [0151] According to some embodiments, e.g., none of the signals u.sub.1(k), d.sub.1(k), or h.sub.1(k) is mixed to any of the signals u.sub.2(k), d.sub.2(k), or h.sub.2(k).

    [0152] In the following, the implementation of the processing step is explained in more detail. Since the processing steps for u.sub.1(k) and u.sub.2(k) are identical, only the processing steps for u.sub.1(k) will be described, which are also applied to u.sub.2(k) by exchanging the indices 1 and 2.

    [0153] A rule to obtain b.sub.1(k) may, e.g., be given by


    b.sub.1(k)=?.sub.1b.sub.1(k?1)+(1??.sub.1)?.sub.i=1.sup.Lu.sub.1.sup.2(k,l),(19)

    where ?.sub.1 may, e.g., be chosen close to but less than 1.

    [0154] In the above-formula u.sub.1(k,l) is assumed to comprise one or more audio channels. L indicates the number of audio channels of u.sub.1(k).

    [0155] In a simple case, lit (k) comprises only a single channel and formula (19) becomes:


    b.sub.1(k)=?.sub.1b.sub.1(k?1)+(1??.sub.1)u.sub.1.sup.2(k,l)(19a)

    [0156] ?.sub.1 may be in the range 0<?.sub.1<1. Advantageously, ?.sub.1 may, e.g., be close to 1. For example, ?.sub.1 may, e.g., be in the range 0.9<?.sub.1<1.

    [0157] In other cases u.sub.1(k) for example, comprises two or more channels.

    [0158] The scaling factor c.sub.1(k) can then be determined according to

    [00006] c 1 ? ( k ) = 1 b 1 ? ( k ) , ( 20 )

    [0159] such that


    d.sub.1(k,l)=c.sub.1(k)u.sub.1(k,l)(21)

    [0160] describes the scaled audio signal.

    [0161] A rule to obtain e.sub.1(k) may, e.g., be given by


    e.sub.1(k)=?.sub.2e.sub.1(k?1)+(1??.sub.2)?.sub.i=1.sup.Ld.sub.1.sup.2(k,l),(22)

    [0162] ?.sub.2 may be in the range 0<?.sub.2<1.

    [0163] In embodiments, for ?.sub.1 of formula (19) and A, of formula (22): ?.sub.1>?.sub.2.

    [0164] While there is a variety of other options. One of them, according to an embodiment, is the mean square value of d.sub.1.sup.2(k,l) in a window of K samples given by

    [00007] e 1 ? ( k ) = 1 K .Math. .Math. n = - K - 1 .Math. .Math. l = 1 L .Math. d 1 2 ? ( k - n , l ) , ( 23 )

    [0165] Another definition, according to another embodiment, is the maximum squared value in such a window

    [00008] e 1 ? ( k ) = max n = 0 , 1 , .Math. .Math. .Math. , .Math. K - 1 , .Math. l = 1 , 2 , .Math. .Math. .Math. , .Math. L .Math. d 1 2 ? ( k - n , l ) . ( 24 )

    [0166] Acccording to some embodiments, to determine g.sub.1(k), the value e.sub.2(k) has also to be determined as described above. However, the actual method to determine e.sub.2(k), as well as the parameters, may differ from those chosen for e.sub.1(k) (for example, depending on the needs of the application). The actual gain g.sub.1(k) can, e.g., be determined similar to the gaining rule that would be used for a conventional audio compressor, see: [0167] https://en.wikipedia.org/wiki/Dynamic_range_compression (see [65]).

    [0168] but considering both, e.sub.1(k) and e.sub.2(k).

    [0169] According to an embodiment, a gaining rule of an according downward compressor for the signal d.sub.1(k) would be

    [00009] .Math. ( 25 ) g 1 ? ( k ) == { 10 ( T 1 - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) ) ) .Math. ( R - 1 ) / ( 20 .Math. .Math. R ) for .Math. .Math. T 1 - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) ) < 0 .Math. 1 otherwise , .Math. .Math. or .Math. .Math. .Math. ( 25 ) .Math. .Math. g 1 ? ( k ) = { 10 ( T 1 + v ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 1 + v < 0 1 otherwise , .Math. .Math. with .Math. .Math. .Math. v = - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) )

    [0170] where T.sub.1 defines the compression threshold in dB and R the compression ratio, as used in a standard audio compressor. E.g., 1?R?100. For example, 1<R<100. For example, 2<R<100, E.g., 2<R<50.

    [0171] In contrast to formulae (25) and (25), a standard audio compressor according to the state of the art would not consider e.sub.2(k) for determining a gain for d.sub.1(k).

    [0172] Other options are an implementation of an upward compressor defined by

    [00010] .Math. ( 25 .Math. a ) g 1 ? ( k ) == { 10 ( T 2 - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) ) ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 2 - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) ) > 0 1 otherwise , .Math. .Math. or .Math. .Math. .Math. ( 25 .Math. a ) .Math. .Math. .Math. g 1 ? ( k ) = { 10 ( T 2 + v ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 2 + v > 0 1 otherwise , .Math. .Math. with .Math. .Math. .Math. v = - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( e 2 ? ( k ) )

    [0173] which is similar except for the operating range (note the different condition) and different parameters. It should be noted that T.sub.2 defines a lower threshold in contrast to T.sub.1.

    [0174] Some embodiments, where T.sub.2<T.sub.1 combine both gaining rules.

    [0175] In embodiments, the resulting rule to obtain g.sub.1(k) and g.sub.2(k) can be any combination of upward and downward compressors, where practical implementations will typically involve setting bound to the considered ranges of e.sub.1(k) and e.sub.2(k).

    [0176] When more than two signals e.sub.1(k), e.sub.2(k), e.sub.3(k), . . . , e.sub.N(k), for example, N signals, are considered, formula (25) may, e.g., become:

    [00011] g 1 ? ( k ) = { 10 ( T 1 + v 1 ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 1 + v 1 < 0 1 otherwise , .Math. with .Math. .Math. v 1 = - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( .Math. i = 2 N .Math. e i ? ( k ) ) ( 25 .Math. b )

    [0177] For other gains g.sub.2(k), g.sub.3(k), . . . , g.sub.N(k), formula (25) may, e.g., become:

    [00012] g r ? ( k ) = { 10 ( T 1 + v 2 ) .Math. ( R - 1 ) / ( 20 .Math. R ) .Math. for .Math. .Math. T 1 + v 2 < 0 1 otherwise , .Math. with .Math. .Math. v 2 = - 10 .Math. .Math. log 10 ? ( e r ? ( k ) ) + 10 .Math. .Math. log 10 ? ( - e r ? ( k ) + .Math. i = 1 N .Math. e i ? ( k ) ) ( 25 .Math. c )

    [0178] Formula (25a) may, e.g., become:

    [00013] g 1 ? ( k ) = { 10 ( T 2 + v 1 ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 2 + v 1 > 0 1 otherwise , ( 25 .Math. b ) with v 1 = - 10 .Math. .Math. log 10 ? ( e 1 ? ( k ) ) + 10 .Math. .Math. log 10 ? ( .Math. i = 2 n .Math. e i ? ( k ) )

    [0179] For other gains g.sub.2(k), g.sub.3(k), . . . , g.sub.N(k), formula (25a) may, e.g., become:

    [00014] g r ? ( k ) = { 10 ( T 2 + v 2 ) .Math. ( R - 1 ) / ( 20 .Math. R ) for .Math. .Math. T 2 + v 2 > 0 1 otherwise , ( 25 .Math. c ) with v 2 = - 10 .Math. .Math. log 10 ? ( e r ? ( k ) ) + 10 .Math. .Math. log 10 ? ( - e r ? ( k ) + .Math. i = 1 N .Math. e i ? ( k ) )

    [0180] Further alternative rules can be defined to reduce the energy difference between both scenes as given by

    [00015] g 1 ? ( k ) = ( 1 - ? ) + ? .Math. e 2 ? ( k ) e 1 ? ( k ) ( 25 .Math. d )

    [0181] where ?=1 would cause the signal h.sub.1(k) to have the same energy as the signal d.sub.2(k). On the other hand, ?=0 would have no effect, a chosen parameter 0<?<1 can be used to vary the intended influence of that step.

    [0182] Another opportunity is the use of a sigmoid function to limit the energy overshot of h.sub.2(k) compared to d.sub.1(k)

    [00016] g 1 ? ( k ) = e 2 ? ( k ) e 1 ? ( k ) .Math. f .Math. .Math. ( e 1 ? ( k ) e 2 ? ( k ) ) ( 25 .Math. e )

    [0183] where f(x) can be one of

    [00017] f ? ( x ) = x 1 + x 2 , .Math. f ? ( x ) = x 1 + .Math. x .Math. , .Math. f ? ( x ) = tanh ? ( x ) , .Math. f ? ( x ) = 2 ? .Math. arctan ? ( ? 2 .Math. x ) ,

    [0184] which are all limited by ?1<f(x)<1 while f(0)=1 holds.

    [0185] In some embodiments, the audio preprocessor 110 may, e.g., be configured to modify an initial audio signal of the two or more initial audio signals depending on the signal power or the loudness of another initial audio signal of the two or more initial audio signals by determining a gain g.sub.1(k) for said initial audio signal and by applying the gain g.sub.1(k) on said initial audio signal, and the audio preprocessor 110 may, e.g., be configured to determine the gain g.sub.1(k) according to one or more of the above formulae.

    [0186] In the following, further features of preprocessing according to embodiments are described,

    [0187] According to an embodiment, the branch of the signals e.sub.1(k) and e.sub.2(k) that is fed to the respectively opposite side may, e.g., be filtered through a filter describing the actual acoustic coupling of the two zones.

    [0188] Moreover, according to an embodiment, the power estimators may, e.g., operate on signals that have been processed by a weighting filter, for example, that have been processed by a weighting filter described in: [0189] https://en.wikipedia.org/wiki/Weighting_filter (see [66]).

    [0190] According to an embodiment, the power estimators may, e.g., be replaced by loudness estimators as, e.g., described by ITU-R Recommendation BS.1770-4. This will allow for an improved reproduction quality because the perceived loudness is better matched by this model.

    [0191] Furthermore, according to an embodiment, a level threshold may, e.g., be used to exclude silence from being taken into account for the estimates b.sub.1(k) and b.sub.2(k) in the absolute power normalization.

    [0192] Moreover, in an embodiment, a positive time-derivative of the separately estimated power can be used as an indicator for activity of the input signals u.sub.1(k) and u.sub.2(k). The estimates b.sub.1(k) and b.sub.2(k) are then only updated when activity is detected.

    [0193] In the following, a band splitter according to embodiments is described. In particular, an implementation of the block denoted by Band splitter shown in FIG. 7 is presented. In an embodiment, this block may, e.g., be realized as a digital audio crossover, for example, as a digital audio crossover as described in: [0194] https://en.wikipedia.org/wiki/Audio_crossover#Digital (see [67]).

    [0195] The desired frequency response of the input to output paths may, e.g., be a band pass with a flat frequency response in the pass band and a high attenuation in the stop band. The borders of pass bands and stop bands are chosen depending on the frequency range in which the reproduction measures connected to individual outputs can achieve a sufficient acoustics contrast between the respective sound zones.

    [0196] FIG. 9 illustrates an exemplary design of the one or more band splitters according to embodiments, wherein (a) illustrates acoustic contrast achieved by different reproduction methods, and wherein (b) illustrates a chosen magnitude response of the audio crossover. In particular, FIG. 9 illustrates an exemplary design of the filter magnitude response in relation to the achieved acoustic contrast.

    [0197] As can be seen from FIG. 9, the spectral shaper may, e.g., be configured to modify a spectral envelope of an audio signal depending on the acoustic contrast.

    [0198] Various concepts may be employed to realize the actual implementation of the one or more band splitters. For example, some embodiments employ FIR filters, other embodiments employ an IIR filter, and further embodiments employ analog filters. Any possible concept for realizing band splitters may be employed, for example any concept that is presented in general literature on that topic.

    [0199] Some of the embodiments may, for example, comprise a spectral shaper for conducting spectral shaping. When spectral shaping is conducted on an audio signal, the spectral envelope of that audio signal may, e.g., modified and a spectrally-shaped audio signal may, e.g., be obtained.

    [0200] In the following, a spectral shaper according to embodiments is described, in particular, a spectral shaper as illustrated in FIG. 7. Spectral shapers constitute filters that exhibit frequency responses similar to those known for equalizers, such as combinations of first-order or second-order filters, see: [0201] https://en.wikipedia.org/wiki/Equalization_(audio)#Filter_functions (see [68]).

    [0202] However, the eventual frequency responses of spectral filter are designed in a completely different way compared to equalizers: Spectral filters consider the maximum spectral distortion that will be accepted by the listener, and the spectral filters are designed such they attenuate those frequencies which are known to produce acoustic leakage.

    [0203] The rational behind this is that human perception is differently sensitive to spectral distortions of acoustic scenes at certain frequencies, depending on the excitation of the surrounding frequencies and depending on whether the distortion is an attenuation or an amplification.

    [0204] For example, if a notch filter with a small bandwidth is applied to a broadband audio signal, the listeners will only perceive a small difference, if any. However, if a peak filter with the same bandwidth is applied to the same signal, the listeners will most likely perceive a considerable difference.

    [0205] Embodiments are based on the finding that this fact can be exploited because a band-limited breakdown in acoustic contrast results in a peak in acoustic leakage (see FIG. 5). If the acoustic scene reproduced in the bright zone is filtered by an according notch filter, it will most likely not be perceived by the listeners in this zone. On the other hand, the peak of acoustic leakage that is perceived in the dark zone will be compensated by this measure.

    [0206] An example of the corresponding filter response is shown in FIG. 10. In particular, FIG. 10 illustrates an exemplary design of the spectral shapers according to embodiments, wherein (a) illustrates acoustic contrast achieved by a specific reproduction method, and wherein (b) illustrates a chosen magnitude response of the spectral shaping filter.

    [0207] As outlined above, the filter 140 is configured to generate the plurality of loudspeaker signals depending on in which of the two or more sound zones the two or more audio source signals shall be reproduced and depending on in which of the two or more sound zones the two or more audio source signals shall not be reproduced.

    [0208] In the following, a filter 140, e.g., prefilter according to embodiments is described.

    [0209] In an embodiment, for example, one or more audio source signals shall be reproduced in a first sound zone, but not in a second sound zone and at least one further audio source signal shall be reproduced in the second sound zone but not in the first sound zone.

    [0210] See, for example, FIG. 2 and FIG. 3, where a first audio source signal signals u.sub.1(k) shall be reproduced in sound zone 1, but not in sound zone 2, and where a second audio source signal u.sub.2(k) shall be reproduced in sound zone 2, but not in sound zone 1.

    [0211] As each of the two or more preprocessed audio signals h.sub.1(k), h.sub.2(k) has been generated based on one of the two or more audio source signals u.sub.1(k), u.sub.2(k), it follows that in such an embodiment, one or more preprocessed audio signals h.sub.1(k) shall be reproduced in the sound zone 1, but not in the sound zone 2 (namely these one or more preprocessed audio signals h.sub.1(k) that have been generated by modifying the one or more sound source signals u.sub.1(k) that shall be reproduced in the sound zone 1, but not in the sound zone 2). Moreover, it follows that least one further preprocessed audio signal h.sub.2(k) shall be reproduced in the sound zone 2, but not in the sound zone 1 (namely those one or more preprocessed audio signals h.sub.2(k) that have been generated by modifying the one or more sound source signals u.sub.2(k) that shall be reproduced in the sound zone 2, but not in the sound zone 1).

    [0212] Suitable means may be employed that achieve that an audio source signal is reproduced in a first sound zone but not in a second sound zone, or that at least achieve that the audio source signal is reproduced in the first sound zone with a greater loudness than in the second sound zone (and/or or that at least achieve that the audio source signal is reproduced in the first sound zone with a greater signal energy than in the second sound zone).

    [0213] For example, a filter 140 may be employed, and the filter coefficients may, e.g., be chosen such that a first audio source signal that shall be reproduced in the first sound zone, but not in the second sound zone is reproduced in the first sound zone with a greater loudness (and/or with a greater signal energy) than in the second sound zone. Moreover, the filter coefficients may, e.g., be chosen such that a second audio source signal that shall be reproduced in the second sound zone, but not in the first sound zone is reproduced in the second sound zone with a greater loudness (and/or with a greater signal energy) than in the first sound zone.

    [0214] For example, an FIR filter (finite impulse response filter) may, e.g., be employed and the filter coefficients may, e.g., be suitably chosen, for example, as described below.

    [0215] Or, Wave Field Synthesis (WFS), well-known in the art of audio processing may, e.g., be employed (for general information on Wave Field Synthesis, see, for example, as one of many examples [69]).

    [0216] Or, Higher-Order Ambisonics, well-known in the art of audio processing, may e.g., be employed (for general information on Higher-Order Ambisonics, see, for example, as one of many examples [70]).

    [0217] Now, a filter 140 according to some particular embodiments, is described in more detail.

    [0218] In particular, an implementation of the block denoted by G.sub.1(k) and G.sub.2(k) shown in FIG. 7 is presented. A prefilter may, e.g., be associated with an array of loudspeakers. A set of multiple loudspeakers is considered as a loudspeaker array, whenever a prefilter feeds at least one input signal to multiple loudspeakers that are primarily excited in the same frequency range. It is possible that an individual loudspeaker is part of multiple arrays and that multiple input signals are fed to one array, which are then radiated towards different directions.

    [0219] There are different well-known methods to determine linear prefilters such that an array of non-directional loudspeakers will exhibit a directional radiation pattern, see, e.g., [1], [3], [4], [5] and [6].

    [0220] Some embodiments realize a pressure matching approach based on measured impulse responses. Some of those embodiments, which employ such an approach, are described in the following, where only a single loudspeaker array is considered. Other embodiments use multiple loudspeaker arrays. The application to multiple loudspeaker arrays is straightforward.

    [0221] For the description of these embodiments, a notation is used that is more suitable to obtain FIR filters compared to the notation above, which would also cover HR filters. To this end, the filter coefficients g.sub.l,q(k) are captured in the vectors


    g.sub.q=(g.sub.q,1(0), . . . ,g.sub.q,1(L.sub.G?1),g.sub.q,2(0), . . . ,g.sub.q,2(L.sub.G?1),g.sub.q,N.sub.L(0), . . . ,g.sub.q,N.sub.L(L.sub.G?1)).sup.T (26)

    [0222] For the optimization, the convolved impulse response of the prefilters and the room impulse response (RIR) may be considered, which is given by

    [00018] z m ? ( k ) = .Math. l = 1 N L .Math. .Math. n = 0 L G - 1 .Math. h m , l ? ( n ) .Math. g l ? ( k - n ) , ( 27 )

    [0223] where g.sub.l(k) and h.sub.m,l(k) are assumed to be zero for k<0 and k?L.sub.G or k?L.sub.H, respectively.

    [0224] As a result, the overall impulse responses z.sub.m(k) have a length of L.sub.G+L.sub.H?1 samples and can be captured by the vector


    z=(z.sub.1(0),z.sub.1(1), . . . ,z.sub.1(L.sub.G+L.sub.H?2),z.sub.2(0),z.sub.2(1), . . . ,z.sub.2(L.sub.G+L.sub.H?2),z.sub.N.sub.M(0),z.sub.N.sub.M(1), . . . ,z.sub.N.sub.M(L.sub.G+L.sub.H?2)).sup.T.(28)

    [0225] Now, it is possible to define the convolution matrix H, such that


    {circumflex over (z)}=Hg(29)

    [0226] describes the same convolution as Equation (27) does. For the optimization, the desired impulse d.sub.m,q(k) can be defined according to needs of the application.

    [0227] A way to define d.sub.m,q(k) is to consider each loudspeaker as potential source to be reproduced with its original sound field in the bright zone but no radiation to the dark zone. This is described by

    [00019] d m , q ? ( k ) = { h m , q ? ( k - ? .Math. .Math. k ) if .Math. .Math. h m , q ? ( k ) .Math. .Math. belongs .Math. .Math. to .Math. .Math. B q ? ( k ) , 0 if .Math. .Math. h m , q ? ( k ) .Math. .Math. belongs .Math. .Math. to .Math. .Math. D q ? ( k ) , ( 30 )

    [0228] where the delay ?k is used to ensure causality. A perfect reproduction is described by


    d.sub.q=Hg.sub.q(31)

    [0229] but will typically not be possible due to physical constraints. It should be noted that this definition is just one among many, which has some practical merit due to its simplicity, while other definitions may be more suitable, depending on the application scenario.

    [0230] Now, the least-squares reproduction error can be defined as:

    [00020] E q = ( z ^ - d q ) H .Math. W q H .Math. W q ? ( z ^ - d q ) , ( 32 ) = ( g H .Math. H H - d q H ) .Math. W q H .Math. W q ? ( Hg - d q ) , ( 33 )

    [0231] where W.sub.q is a matrix that can be chosen such that a frequency-dependent weighting and/or a position-dependent weighting is achieved.

    [0232] When deriving B.sub.q and D.sub.q from B.sub.q(k) and D.sub.q(k), respectively, in the same way as H was derived from 11(k). Equation (14) can be represented by

    [00021] C q = g q H .Math. B q H .Math. B q .Math. g q g q H .Math. D q H .Math. D q .Math. g q . ( 34 )

    [0233] It should be noted that maximizing Equation (34) can be solved as a generalized eigenvalue problem [3].

    [0234] The error E.sub.q can be minimized by determining the complex gradient of Equation (33) and setting it to zero [7]. The complex gradient of Equation (33) is given by

    [00022] E q ? g q H = H H .Math. W q H .Math. W q .Math. Hg q - H H .Math. W q H .Math. W q .Math. d q . ( 35 )
    Resulting in


    g.sub.q=(H.sup.HW.sub.q.sup.HW.sub.qH).sup.?1H.sup.HW.sub.q.sup.HW.sub.qd.sub.q(36)

    [0235] as the least-squares optimal solution.

    [0236] Although, many algorithms are formulated for non-weighted least squares, they can be used to implement weighted least squares by simply replacing H and d.sub.q with W.sub.qH and W.sub.qd.sub.q, respectively.

    [0237] The weighting matrix W.sub.q is in general a convolution matrix similar to H defined by (26) to (29).

    [0238] The matrix H consist of several submatrices H.sub.m,l:

    [00023] H = ( H 1 , 1 H 1 , 2 .Math. H 1 , N L H 2 , 1 H 2 , 2 .Math. H 2 , N L .Math. .Math. ? .Math. H 1 , 1 H 1 , 2 .Math. H 1 , N L ) ( 36 .Math. a )

    [0239] An example for H.sub.m,l can be given assuming

    [00024] h 1 , 1 ? ( 0 ) = 5 .Math. .Math. h 1 , 1 ? ( 1 ) = 4 .Math. .Math. h 1 , 1 ? ( 2 ) = 3 .Math. .Math. h 1 , 1 ? ( 3 ) = 2 .Math. .Math. h 1 , 1 ? ( 4 ) = 1 ( 36 .Math. b ) where H 1 , 1 = ( 5 0 0 0 4 5 0 0 3 4 5 0 2 3 4 5 1 2 3 4 0 1 2 3 0 0 1 2 0 0 0 1 ) ( 36 .Math. c )

    [0240] From that scheme it is clear to the expert how (27) and (29) define the structure of H.

    [0241] To facilitate a frequency-dependent and microphone-dependent weighting through W.sub.q, the impulse responses w.sub.m,q(k) according to the well-known filter design methods. Here, w.sub.m,q(k) defines the weight for source q and microphone in. Unlike H, W.sub.q is a block-diagonal matrix:

    [00025] W q = ( W 1 , q 0 .Math. 0 0 W 2 , q .Math. 0 .Math. .Math. ? .Math. 0 0 .Math. W N M , q ) ( 36 .Math. d )

    [0242] where W.sub.m,q is structured like H.sub.m,l.

    [0243] Regarding the computation of the filter coefficients, noting that (36) gives the filter coefficients explicitly, its computation is very demanding in practice. Due to the similarity of this problem to the problem solved for listening room equalization, the methods used there can also be applied.

    [0244] Hence, a very efficient algorithm to compute (36) is described in [71]: SCHNEIDER, Martin; KELLERMANN, Walter. Iterative DFT-domain inverse filter determination for adaptive listening room equalization. In: Acoustic Signal Enhancement; Proceedings of IWAENC 2012; International Workshop on. VDE, 2012, S. 1-4.

    [0245] In the following, a loudspeaker-enclosure-microphone system (LEMS) according to embodiments is described. In particular, the design of an LEMS according to embodiments is discussed. In some embodiments, the measures described above may, e.g., rely on the distinct properties of the LEMS

    [0246] FIG. 11 illustrates an exemplary loudspeaker setup in an enclosure according to an embodiment. In particular, FIG. 11 illustrates an exemplary LEMS with four sound zones is shown. An individual acoustic scene should be replayed in each of those sound zones. To this end, the loudspeakers shown in FIG. 11 are used in specific ways, depending on their position relative to each other and relative to the sound zones.

    [0247] The two loudspeaker arrays denoted by Array 1 and Array 2 are used in conjunction with accordingly determined prefilters (see above). In this way, it is possible to electrically steer the radiation of those arrays towards Zone 1 and Zone 2. Assuming that both arrays exhibit an inter-loudspeaker distance of a few centimeters while the arrays exhibit an aperture size of a few decimeters, effective steering is possible for midrange frequencies.

    [0248] Although it is not obvious, the omni-directional loudspeakers LS 1, LS 2, LS 3, and LS 4, which may, e.g., be located 1 to 3 meters distant to each other can also be driven as a loudspeaker array when considering frequencies below, e.g., 300 Hz. According prefilters can be determined using the method described above.

    [0249] The loudspeakers LS 5 and LS 6 are directional loudspeakers that provide high-frequency audio to Zones 3 and 4, respectively.

    [0250] As described above, measures for directional reproduction may sometimes not lead to sufficient results for the whole audible frequency range. To compensate for this issue, there may, for example, be loudspeakers located in the close vicinity or within the respective sound zones. Although this positioning is suboptimal with respect to the perceived sound quality, the difference in distance of the loudspeakers to the zone assigned compared to the distance to the other zones allows for a spatially focused reproduction, independent of frequency. Thus, these loudspeakers may, e.g., be used in frequency ranges where the other methods do not lead to satisfying results.

    [0251] In the following, further aspects according to some of the embodiments are described:

    [0252] In some of the embodiments, the Preprocessing block is placed after the Band splitter blocks or after the Spectral shaper blocks. In that case, one preprocessing block may, e.g., be implemented for each of the splitted frequencies bands. In the example shown in FIG. 7 one Preprocessing block would consider w.sub.1(k) and w.sub.4(k) and another w.sub.2(k) and w.sub.3(k). Still, one aspect of the preprocessing has still to be placed at the old position, as described above, where preprocessing is described.

    [0253] Since the acoustic leakage depends on the reproduction method which is chosen differently for each frequency band, such an implementation has the advantage that the preprocessing parameters can be matched to the demands of the reproduction method. Moreover, when choosing such an implementation, compensating for the leakage in one frequency band will not affect another frequency band. Since the Preprocessing block is not an LTI system this exchange implies a change in the functionality of the overall system, even though the resulting system will still reliably solve the same problem.

    [0254] Additionally, it should be noted that some of the embodiments may use a measuring of the impulse responses from all loudspeakers to multiple microphones prior to operation. Hence, no microphones may be used during operation.

    [0255] The proposed method is generally suitable for any multizone reproduction scenario, for example, in-car scenarios.

    [0256] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.

    [0257] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software or at least partially in hardware or at least partially in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.

    [0258] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.

    [0259] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.

    [0260] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.

    [0261] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.

    [0262] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.

    [0263] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.

    [0264] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.

    [0265] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.

    [0266] A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.

    [0267] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.

    [0268] The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

    [0269] The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.

    [0270] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.

    REFERENCES

    [0271] [1] W. Druyvesteyn and J. Garas, Personal sound, Journal of the Audio Engineering Society, vol, 45, no, 9, pp. 685-701, 1997. [0272] [2] F. Dowla and ? Spiridon, Spotfo ming with an array of ultra-wideband radio transmitters, in Ultra Wideband Systems and Technologies, 2003 IEEE Conference on, November 2003, pp. 172-175. [0273] [3] J. W. Choi and Y. H. Kim, Generation of an acoustically bright zone with an illuminated region using multiple sources, Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1695-1700, 2002. [0274] [4] M. Poletti, An investigation of 2-d multizone surround sound systems, in Audio Engineering Society Convention 125, October 2008. [Online]. Available: http://www.aes.org/e-liblbrowse.cfm?elib=14703 [0275] [5] Y. Wu and T. Abhayapala, Spatial multizone soundfieid reproduction, in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, April 2009, pp. 93-96. [0276] [6] Y. J. Wu and T. D. Abhayapala, Spatial multizone soundfieid reproduction: Theory and design, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 6, pp. 1711-1720, 2011. [0277] [7] D. Brandwood, A complex gradient operator and its application in adaptive array theory, Microwaves, Optics and Antennas, IEE Proceedings H, vol. 130, no. 1, pp, 11-16, February 1983. [0278] [8] US 2005/0152562 A1. [0279] [9] US 2013/170668 A1, [0280] [10] US 2008/0071400 A1, [0281] [11] US 2006/0034470 A1 [0282] [12] US 2011/0222695 A1 [0283] [13] US 2009/0232320 A1 [0284] [14] US 2015/0256933 A1. [0285] [15] U.S. Pat. No. 6,674,865 B1, [0286] [16] DE 30 45 722 A1. [0287] [17] US 2012/0140945 A1. [0288] [18] US 2008/0273713 A1. [0289] [19] US 2004/0105550 A1, [0290] [20] US 2006/0262935 A1, [0291] [21] US 2005/0190935 A1. [0292] [22] US 2008/0130922 A1 [0293] [23] US 2010/0329488 A1 [0294] [24] DE 10 2014 210 105 A1, [0295] [25] US 2011/0286614 A1, [0296] [26] US 2007/0053532 A1. [0297] [27] US 2013/0230175 A1. [0298] [28] WO 2016/008621 A1, [0299] [29] US 2008/0273712 A1. [0300] [30] U.S. Pat. No. 5,870,484. [0301] [31] U.S. Pat. No. 5,309,153. [0302] [32] US 2006/0034467 A1. [0303] [33] US 2003/0103636 A1. [0304] [34] US 2003/0142842 A1. [0305] [35] JP 5345549, [0306] [36] US2014/0056431 A1 [0307] [37] US 2014/0064526 A1. [0308] [38] US 2005/0069148 A1, [0309] [39] U.S. Pat. No. 5,081,682, [0310] [40] DE 90 15 454. [0311] [41] U.S. Pat. No. 5,550,922, [0312] [42] U.S. Pat. No. 5,434,922, [0313] [43] U.S. Pat. No. 6,073,670. [0314] [44] U.S. Pat. No. 6,674,865 B [0315] [45] DE 100 52 104 A1, [0316] [46] US 2005/0135635 A1 [0317] [47] DE102 42 558 A1. [0318] [48] US 2010/0046765 A1. [0319] [49] DE 10 2010 040 639 [0320] [50] US 2008/0103615 A1. [0321] [51] U.S. Pat. No. 8,190,438 B1. [0322] [52] WO 2007/098916 A1. [0323] [53] US 2007/0274546 A1, [0324] [54] US 2007/0286426 A1. [0325] [55] U.S. Pat. No. 5,013,205. [0326] [56] U.S. Pat. No. 4,944,018. [0327] [57] DE 103 51 145 A1, [0328] [58] JP 2003-255954. [0329] [59] U.S. Pat. No. 4,977,600, [0330] [60] U.S. Pat. No. 5,416,846. [0331] [61] US 2007/0030976 A1, [0332] [62] JP 2004-363696, [0333] [63] Wikipedia: Angular resolution, https://en.wikipedia.org/wiki/Angular_resolution retrieved from the Internet on 8 Apr. 2016. [0334] [64] Wikipedia: Nyquist-Shannon_sampling_theorem, https://en.wikipedia.org/wiki/Nyquist-Shanhon_sampling_theorem retrieved from the Internet on 8 Apr. 2016. [0335] [65] Wikipedia: Dynamic range compression, https://en.wikipedia.org/wiki/Dynamic_range_compression, retrieved from the Internet on 8 Apr. 2016. [0336] [66] Wikipedia: Weighting filter, https://en.wikipedia.org/wiki/Weighting_filter, retrieved from the Internet on 8 Apr. 2016. [0337] [67] Wikipedia; Audio crossoverDigital, https://en.wikipedia.org/wiki/Audio_crossover#Digital, retrieved from the Internet on 8 Apr. 2016. [0338] [68] Wikipedia: Equalization (audio)Filter functions, https://en.wikipedia.org/wiki/Equalization_(audio)#Filter Junctions, retrieved from the Internet on 8 Apr. 2016. [0339] [69] WO 2004/114725 A1, [0340] [70] EP 2 450 880 A1. [0341] [71] SCHNEIDER, Martin; KELLERMANN, Walter: Iterative IDT-domain inverse filter determination for adaptive listening room equalization. In: Acoustic Signal Enhancement; Proceedings of IWAENC 2012; International Workshop on. VDE, 2012, S. 1-4.