Method and device for generating an elevated sound impression

Abstract

A sound field device is disclosed that comprises an elevation cue estimator, a low-frequency filter estimator, and a high-frequency filter estimator. The elevation cue-estimator is configured to estimate an elevation cue of a head-related transfer function (HRTF) of at least one listener. The low-frequency filter estimator is configured to estimate one or more low-frequency filter elements based on the elevation cue. The high-frequency filter estimator is configured to estimate one or more high-frequency filter elements based on the elevation cue. An estimation method of the low-frequency filter estimator is different from an estimation method of the high-frequency filter estimator. The one or more low-frequency filter elements and the one or more high-frequency filter elements are for driving an array of loudspeakers to generate an elevated sound impression at a bright zone.

Claims

1. A sound field device, comprising: an elevation cue estimator configured to estimate an elevation cue of a head-related transfer function (HRTF) of at least one listener; a low-frequency filter estimator configured to estimate, according to a first estimation technique, one or more low-frequency filter elements based on the elevation cue; and a high-frequency filter estimator configured to estimate, according to a second estimation technique, one or more high-frequency filter elements based on the elevation cue, the first estimation technique being different from the second estimation technique; wherein: the one or more low-frequency filter elements and the one or more high-frequency filter elements are for driving an array of loudspeakers to generate an elevated sound impression at a bright zone; and each of the low-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers and each of the high-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers.

2. The sound field device of claim 1, wherein the low-frequency filter estimator comprises an optimizer configured to determine the one or more low-frequency filter elements by optimizing an error measure between a desired sound field at one or more control points of the bright zone, weighted by the elevation cue, and an estimate of a transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone.

3. The sound field device of claim 2, wherein the optimizer is configured to determine the one or more low-frequency filter elements u(k) as:
min.sub.u(k)H.sub.b(k)u(k)HRTF.sub.el(,k)P.sub.d.sup.2 subject to u(k).sup.2N.sub.1 and H.sub.j(k)u(k)N.sub.j, where N.sub.j=M.sub.1P.sub.dHRTF.sub.el(,k).sup.2/M.sub.j for j2, N.sub.1 is a predetermined parameter, H.sub.b(k) is an acoustic transfer function matrix from the array of loudspeakers to the one or more bright zone control points inside the bright zone, H.sub.j(k) is an acoustic transfer function matrix from the array of loudspeakers to one or more quiet zone control points inside at least one quiet zone, P.sub.d is a desired sound field for the one or more control points, M.sub.1 is a number of control points within the bright zone and M.sub.j is a number of control points within a j-th quiet zone, wherein j2.

4. The sound field device of claim 2, wherein the low-frequency filter estimator is configured to estimate the transfer function to the one or more control points by evaluating one or more of the following: one or more three-dimensional (3D) Green's functions with free-field assumption; and one or more measurements of a room impulse response.

5. The sound field device of claim 1, wherein the high-frequency filter estimator comprises: a loudspeaker selection unit configured to select one or more active loudspeakers such that locations of the one or more active loudspeakers overlap with a projection of the bright zone on the array of loudspeakers; and a loudspeaker weight assigning unit configured to assign one or more frequency-dependent weights to the one or more active loudspeakers.

6. The sound field device of claim 5, wherein the loudspeaker weight assigning unit is configured to assign weights of {square root over (N.sub.1/P)} HRTF.sub.el(,k) to the one or more active loudspeakers, wherein P is a number of active loudspeakers and N.sub.1 is a predetermined parameter.

7. The sound field device of claim 1, wherein a cutoff frequency between the one or more low-frequency filter elements and the one or more high-frequency filter elements is chosen as (Q1)c/4r, wherein Q is a number of loudspeakers in the array of loudspeakers, r is a radius of the bright zone, and c is a speed of sound.

8. The sound field device of claim 1, wherein the elevation cue estimator is configured to estimate the elevation cue independent of an azimuth angle of a source relative to the bright zone.

9. The sound field device of claim 1, wherein the elevation cue estimator is configured to compute the elevation cue according to: ${HRTF}_{el} (,, k) = {.Math.}_{i = 1}^{N} \frac{{HRTF}_{i} (, 0, k)}{{HRTF}_{i} (_{s}, 0, k)} / N$ wherein HRTF (, 0, k) is a HRTF of an i-th person.

10. An audio system, comprising: a detector configured to determine an elevation of a virtual sound source relative to a listener; a sound field device configured to determine a plurality of filter elements based on the determined elevation of the virtual sound source; a signal generator configured to generate a driving signal weighted with the determined plurality of filter elements; and an array of loudspeakers.

11. The audio system of claim 10, wherein the array of loudspeakers is arranged in a horizontal plane.

12. The audio system of claim 10, wherein: the plurality of filter elements comprise one or more low frequency filter elements and one or more high-frequency filter elements, the one or more low-frequency filter elements and the one or more high-frequency filter elements are for driving the array of loudspeakers to generate an elevated sound impression at a bright zone; the sound field device comprises: a low-frequency filter estimator configured to estimate, according to a first estimation technique, one or more low-frequency filter elements based on an estimated elevation cue of a head-related transfer function (HRTF) of at least one listener; and a high-frequency filter estimator configured to estimate, according to a second estimation technique, one or more high-frequency filter elements based on the estimated elevation cue, the first estimation technique being different from the second estimation technique.

13. The audio system of claim 12, wherein the high-frequency filter estimator comprises: a loudspeaker selection unit configured to select one or more active loudspeakers such that locations of the one or more active loudspeakers overlap with a projection of the bright zone on the array of loudspeakers; and a loudspeaker weight assigning unit configured to assign one or more frequency-dependent weights to the one or more active loudspeakers.

14. A method, comprising: estimating an elevation cue of a head-related transfer function (HRTF) of at least one listener; estimating, using a first estimation method, one or more low-frequency filter elements based on the elevation cue; and estimating, using a second estimation method that is different from the first estimation method, one or more high-frequency filter elements based on the elevation cue, the one or more low-frequency filter elements and the one or more high-frequency filter elements for driving an array of loudspeakers to generate an elevated sound impression at a bright zone, each of the low-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers and each of the high-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers.

15. The method of claim 14, wherein the method is performed for a plurality of source signals and a plurality of bright zones.

16. The method of claim 14, wherein estimating the one or more low-frequency filter elements comprises determining the one or more low-frequency filter elements by optimizing an error measure between a desired sound field at one or more control points of the bright zone, weighted by the elevation cue, and an estimate of a transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone.

17. A non-transitory computer-readable storage medium storing program code, the program code comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: estimating an elevation cue of a head-related transfer function (HRTF) of at least one listener; estimating, using a first estimation method, one or more low-frequency filter elements based on the elevation cue; and estimating, using a second estimation method that is different from the first estimation method, one or more high-frequency filter elements based on the elevation cue, the one or more low-frequency filter elements and the one or more high-frequency filter elements for driving an array of loudspeakers to generate an elevated sound impression at a bright zone, each of the low-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers and each of the high-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers.

18. The non-transitory computer-readable storage medium of claim 17, wherein the operations are performed for a plurality of source signals and a plurality of bright zones.

19. The non-transitory computer-readable storage medium of claim 17, wherein estimating the one or more low-frequency filter elements comprises determining the one or more low-frequency filter elements by optimizing an error measure between a desired sound field at one or more control points of the bright zone, weighted by the elevation cue, and an estimate of a transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone.

20. The non-transitory computer-readable storage medium of claim 19, wherein determining an estimate of the transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone by evaluating one or more of the following: one or more three-dimensional (3D) Green's functions with free-field assumption; and one or more measurements of a room impulse response.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) To illustrate the technical features of embodiments of the present application more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present application, but modifications on these embodiments are possible without departing from the scope of the present application as defined in the claims.

(2) FIG. 1 shows a simplified block diagram of a sound field device in accordance with an embodiment of the application,

(3) FIG. 2 shows a simplified block diagram of an audio system in accordance with a further embodiment of the application,

(4) FIG. 3 shows a flow chart of a method in accordance with a further embodiment of the application,

(5) FIG. 4 shows a simplified block diagram of an audio system in accordance with a further embodiment of the application,

(6) FIG. 5 shows a simplified flowchart of a dual-band multi zone sound rendering with elevation cues, in accordance with a further embodiment of the application, and

(7) FIG. 6 is a simplified illustration of an application of a sound system in accordance with the present application in a car.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

(8) FIG. 1 shows a simplified block diagram of a sound field device 100 configured to determine filter elements for driving an array of loudspeakers to generate an elevated sound impression at a bright zone. Sound field device 100 comprises an elevation cue estimator 110 configured to estimate an elevation cue of a head-related transfer function (HRTF) of at least one listener, a low-frequency filter estimator 120 configured to estimate one or more low-frequency filter elements based on the elevation cue, and a high-frequency filter estimator 130 configured to estimate one or more high-frequency filter elements based on the elevation cue.

(9) Elevation cue estimator 110, and low- and high-frequency filter estimators 120, 130 can be implemented in the same physical device, e.g., the same processor can be configured to act as elevation cue estimator 110, low-frequency filter estimator 120 and/or high-frequency filter estimator 130.

(10) A (first) estimation method of low-frequency filter estimator 120 is different from a (second) estimation method of high-frequency filter estimator 130. For example, the first and second method can be different in the sense that they use different computational techniques for determining the low- and high-frequency filter elements.

(11) Sound field device 100 can be configured to further comprise a signal generator (not shown in FIG. 1), which can be configured to generate a drive signal for the plurality of loudspeakers based on the filter elements computed by low- and high-frequency filter estimators 120, 130. For example, the signal generator can be configured to generate a plurality of driving signals for the plurality of loudspeakers by weighting an input signal with the low- and high frequency filter elements. For example, the low- and high-frequency filter elements can correspond to the plurality of loudspeakers, e.g., each of the filter elements corresponds to one of the loudspeakers.

(12) FIG. 2 shows a simplified block diagram of an audio system 200, which comprises a detector 210, a sound field device 100, a signal generator 220, and an array of loudspeakers 230. Detector 210 is configured to determine an elevation of a virtual sound source relative to a listener. Sound field device 100 (e.g., sound field device 100 of FIG. 1) is configured to determine a plurality of filter elements. Signal generator 220 is configured to generate a driving signal 222 weighted with the determined plurality of filter elements.

(13) Detector 210, sound field device 100, and signal generator 220 can be part of one apparatus.

(14) System 200 can further comprise an amplifier (not shown in FIG. 2), which amplifies drive signal 222 of signal generator 220 in order to drive the plurality of loudspeakers 230.

(15) The array of loudspeakers 230 can be arranged in one horizontal plane. In other embodiments, the array of loudspeakers 230 can be arranged in different height levels. In certain embodiments, system 200 comprises a unit for determining an elevation level of the loudspeakers 230, such that the filter elements and thus the plurality of drive signals 222 can be computed with knowledge of the elevation level of each of the loudspeakers 230. To this end, the unit for determining the elevation level can comprise an input unit where a user can input information about the elevation level of the loudspeakers 230. In other embodiments, the unit for determining the elevation level can comprise a sensor for sensing an elevation level of the loudspeakers 230 without manual input from a user.

(16) FIG. 3 shows a flow chart of a method 300 for determining filter elements for driving an array of loudspeakers to generate an elevated sound impression at a bright zone. In a first step 310 an elevation cue of an HRTF of at least one listener is estimated. In a second step 320, using a first estimation method, one or more low-frequency filter elements based on the elevation cue are estimated. In a third step 330 using a second estimation method that is different from the first estimation method, one or more high-frequency filter elements based on the elevation cue are estimated.

(17) Method 300 may comprise further steps (not shown in FIG. 4) of obtaining an input signal, weighting the input signal with the filter elements to generate a plurality of drive signals and/or amplifying the generated drive signals.

(18) FIG. 4 shows an audio system 400 in accordance with an embodiment of the application. Audio system 400 comprises a plurality of dual-band multi-zone sound renderers 410. Each of the plurality of dual-band multi-zone sound renderers 410 comprises a low-frequency filter estimator and a high-frequency filter estimator.

(19) As illustrated in FIG. 4, each of the dual-band sound renderers 410 is provided with information not only about n source signals, but also with information about n elevation specifications 424. An elevation specification can for example simply comprise an elevation angle relative to a listener. The dual-band sound renderers 410 further receive information about the bright and quiet zones 422a, 423a, 422b, 423b and about a setup of a linear loudspeaker array 430a. Based on this information, the dual-band sound renderers 410 can compute filter elements for each of the source signals. The individual filter elements 412a, 412b can then be combined and applied to an input signal (not shown in FIG. 4) in order to obtain the plurality of loudspeakers driving signals 412, which are used to drive the plurality of loudspeakers 430.

(20) As illustrated in FIG. 4, the same zone 422a that acts as a bright zone for the first source signal 420a can act as a quiet zone 422b for a further source signal 420b. The zone 423a that was a quiet zone for the first source signal 420a is now a bright zone 423b for the further source signal 420b.

(21) FIG. 4 is only meant as an illustration of the processing of a plurality of source signals. For example, the skilled person understands that in practice, a sound rendering device could be configured to iteratively compute filter elements for each of the source signals, e.g., only one rendering device could iteratively compute filter elements for a plurality of source signals.

(22) FIG. 5 shows a simplified flowchart of a method 500 for dual-band multi zone sound rendering with elevation cues. In a first step 510, elevation cues HRTF.sub.el(,k), indicated with reference number 510a, are computed based on a system specification. In a further step 520, the elevation cues are smoothed in an octave smoothing step. Subsequently, the processing is split-up, 522, depending on the frequency and in steps 530, 540 the processing is continued differently for low-pass and high-pass filter elements.

(23) For the generation of the low-frequency filter elements, in step 532 the desired sound field P.sub.d and the transfer matrices H.sub.b and H.sub.j are computed. Subsequently, in step 534 a multi-constraint convex optimization is performed in order to determine the optimal low-frequency filter elements u.

(24) For frequencies with k2f/c (low-pass filtering), wherein k=2f/c, a joint-optimization with multi-constraint is formulated. A desired horizontal sound field in vector P.sub.d (dimension: M.sub.11) is defined for the control points within the bright zone. The desired sound field can be, for example, a plane wave function arriving from the speaker array or simply set to 1. The acoustic transfer function matrix from each loudspeaker to points inside the bright zone H.sub.b (M.sub.1Q), the acoustic transfer function matrix from each loudspeaker to points inside the quiet zones H.sub.j (M.sub.jQ) (j=2 . . . n). The acoustic transfer of the loudspeakers can be derived following the 3D Green's function with free-field assumption or based on additional microphone measurements of the room impulse responses. The loudspeaker filtering weights vector w (Q1). The acoustic transfer function can M.sub.1 represents the number of control points within the selected bright zone and M.sub.j is the number of control points within the j-th quiet zone.

(25) A multi-constraint optimization with the objective of minimizing the mean square error to the desired sound field with the consideration of HRTF elevation over the bright zone:

(26) $\min_{w} {.Math. H_{b} w - P_{d} {HRTF}_{el} (, k) .Math.}^{2}$ $subject to {.Math. w .Math.}^{2} N_{1} and {.Math. H_{j} w .Math.}^{2} N_{j}, where N_{j} = M_{1} {.Math. P_{d} {HRTF}_{el} (, k) .Math.}^{2} / M_{j} .$
a defines the acceptable level of sound energy leakage into the quiet zone and can be customized by users. N.sub.1 specifies the constraint on the loudspeaker array effort.

(27) The low-frequency filter elements u and the high-frequency filter elements v are merged to obtain a complete set of filter elements w, indicated with reference number 545. The filter elements are applied to a signal in frequency domain and an Inverse Fourier Transform is applied in step 550. On the resulting signal 552, a convolution 560 with speaker impulse responses is applied, which yields the output.

(28) For the generation of the high-frequency filter elements (e.g., with wave numbers k>(Q1)/2r, where Q is the number of speakers and r is the radius of each selected zone) in step 542 a loudspeaker selection is performed, and in step 544 weights are assigned to the selected active loudspeakers. This results in high-frequency filter elements v.

(29) In the high-pass filter filtering, the reproduction accuracy may be undermined due to the limited number of employed loudspeakers, which may affect the desired listening experience, especially for the sensation of the elevation. Therefore, a different filter design strategy may be applied. At high frequencies, as the ratio of the size of the piston to the wavelength of the sound increases, the sound field radiated by the speaker becomes even narrower and side lobes appear.

(30) Therefore, suppression of sound leakage at high frequencies can be achieved by exploiting the native directivity of the loudspeakers. The activated loudspeaker array partition may be selected such that it overlaps with the projection of the bright zone on the speaker array. It will be assumed that the number of selected loudspeakers is P. The loudspeaker weights assigned to the activated loudspeakers are {square root over (N.sub.1/P)}HRTF.sub.el(,k) in order to satisfy the constraint of w.sup.2N.sub.1.

(31) After the derivation of the loudspeaker filtering gain in the frequency domain using a bin-by-bin approach, the output of the system, which is the finite impulse responses for the speaker array, can be obtained by performing an Inverse Fast Fourier Transform (IFFT). The derivation of the speaker impulse responses can be conducted offline (e.g., once for each car/conference room and its zone/loudspeaker set-up), if appropriate.

(32) To fulfill the multi zone settings, filters that create n sets of one bright and (n1) quiet zones setup over the selected regions are needed for n (n2) source signals (as shown in FIG. 4). The system features a combination of the HRTF elevation cues spectral filtering with horizontal multi zone sound field rendering system. An objective is to deliver the n input source signals simultaneously to n different spatial regions with various elevated sensations with the minimum inter-zone sound leakage via the 2D loudspeaker array.

(33) To achieve this, a dual-band rendering system aiming to accurately reproduce the desired 3D elevated sound with the consideration of HRTF over the selected bright zone is provided. More specifically, a joint-optimization system with multiple constraints is applied to the filter design to minimize the reproduction to the desired 3D sound field over multiple listening areas at low frequencies. In contrast, the sound separation is achieved by a selection process of active loudspeakers at high frequencies and the characteristics of HRTF elevation cues may be preserved over the selected regions.

(34) The HRTF elevation cues in FIG. 5 can be extracted, for example, from online public HRTF databases (e.g., the Center for Image Processing and Integrated Computing (CIPIC), University of California at Davis, HRTF database). The HRTF elevation cues are considered to be symmetric in azimuth angle and are common in any sagittal planes. With this assumption, in certain embodiments, only the set of elevation cues for the median plane (e.g., =0) is needed. It may be advantageous to eliminate the filtering effect produced by a head exposed to a front coming sound and retain only the filtering effects due to elevation cues. For this purpose, the HRTF is normalized as follows:

(35) ${HRTF}_{el} (,, k) = {.Math.}_{i = 1}^{N} \frac{{HRTF}_{i} (, 0, k)}{{HRTF}_{i} (_{s}, 0, k)} / N$
where .sub.s is the elevation angle of the physical sources to the plane where the listeners' ears are locate. Therefore, in certain embodiments, the loudspeaker array is not only limited to the horizontal plane but can also be placed at other height levels (e.g., placed at the ceiling of the room or in a car).

(36) The proposed dual-band rendering system in FIG. 5 may apply different strategies for accurately reconstructing the desired multi zone sound field with the consideration of HRTF cues, especially the features of HRTF elevation cues for both low and high frequency ranges. Important spectral features (e.g., peaks or notches) of the elevation cues appear at both low frequency ranges (e.g., below 2 kHz) and the frequency range beyond 8 kHz.

(37) FIG. 6 illustrates how the audio system can be applied to a car audio system. Due to the spatial limitation in the car chamber, it is convenient to place an array of 12 microspeakers at the ceiling of the car (e.g., over the passenger's head). The speaker array creates two separate personal zones for the driver and the co-driver seats. Two difference input audio signals (e.g., navigation speech stream for the driver and mono/stereo music for the co-driver) are delivered simultaneously to the two seat areas. Various virtual elevations can also be rendered for the different passengers. Therefore, the passengers can not only hear the sound from the top ceiling (which may lead to confusion), but also have the sensation that the sound is coming right in front in a 3D setting.

(38) Advantages of certain embodiments of the application include: In addition to the horizontal multi zone sound rendering, a more immersive elevated sensation can be provided in any location inside the selected zones of interests; The joint-optimization formulation in the dual-band rendering system provides a more accurate reproduction of the desired sound field with the consideration of HRTF elevation over the selected zone, especially at low frequency range; The application is capable of rendering different elevated virtual sources for various zones simultaneously; No additional loudspeakers or changing the 2D loudspeaker setup are needed; Limited additional computational cost.

(39) The described sound field device and audio system can be applied in many scenarios, including, for example: Any sound reproduction system or surround sound system with 2D loudspeaker array (most commonly used in existing products). The elevation rendering in the application addresses the limitation due to 2D speaker setup and provides more immersive 3D virtual sound.

(40) In particular examples, the sound field device and the audio system can be applied in the following scenarios: a TV speaker system, a car entertaining system, a teleconference system, and/or a home cinema system,
where the personal listening environments for one or multiple listeners are desirable.

(41) The foregoing descriptions are only implementation manners of the present application; the protection of the scope of the present application is not limited to this. Any variations or replacements can be easily made through a person skilled in the art. Therefore, the protection scope of the present application should be subject to the protection scope of the attached claims.

Method and device for generating an elevated sound impression

Assignee

Inventors

Cpc classification

Classification Explorer

H04S2420/01

ELECTRICITY

Classification Explorer

H04S2420/13

ELECTRICITY

Classification Explorer

H04S3/02

ELECTRICITY

Classification Explorer

H04S7/307

ELECTRICITY

Classification Explorer

H04S3/002

ELECTRICITY

Classification Explorer

H04R3/12

ELECTRICITY

Classification Explorer

H04R2499/13

ELECTRICITY

International classification

Classification Explorer

H04R3/12

ELECTRICITY

Classification Explorer

H04S7/00

ELECTRICITY

Classification Explorer

H04S3/02

ELECTRICITY

Classification Explorer

H04S3/00

ELECTRICITY

Abstract

Claims

Description