Method and device for generating an elevated sound impression
10419871 ยท 2019-09-17
Assignee
Inventors
Cpc classification
H04S2420/01
ELECTRICITY
H04S2420/13
ELECTRICITY
H04S3/02
ELECTRICITY
H04S3/002
ELECTRICITY
International classification
H04S7/00
ELECTRICITY
H04S3/02
ELECTRICITY
Abstract
A sound field device is disclosed that comprises an elevation cue estimator, a low-frequency filter estimator, and a high-frequency filter estimator. The elevation cue-estimator is configured to estimate an elevation cue of a head-related transfer function (HRTF) of at least one listener. The low-frequency filter estimator is configured to estimate one or more low-frequency filter elements based on the elevation cue. The high-frequency filter estimator is configured to estimate one or more high-frequency filter elements based on the elevation cue. An estimation method of the low-frequency filter estimator is different from an estimation method of the high-frequency filter estimator. The one or more low-frequency filter elements and the one or more high-frequency filter elements are for driving an array of loudspeakers to generate an elevated sound impression at a bright zone.
Claims
1. A sound field device, comprising: an elevation cue estimator configured to estimate an elevation cue of a head-related transfer function (HRTF) of at least one listener; a low-frequency filter estimator configured to estimate, according to a first estimation technique, one or more low-frequency filter elements based on the elevation cue; and a high-frequency filter estimator configured to estimate, according to a second estimation technique, one or more high-frequency filter elements based on the elevation cue, the first estimation technique being different from the second estimation technique; wherein: the one or more low-frequency filter elements and the one or more high-frequency filter elements are for driving an array of loudspeakers to generate an elevated sound impression at a bright zone; and each of the low-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers and each of the high-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers.
2. The sound field device of claim 1, wherein the low-frequency filter estimator comprises an optimizer configured to determine the one or more low-frequency filter elements by optimizing an error measure between a desired sound field at one or more control points of the bright zone, weighted by the elevation cue, and an estimate of a transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone.
3. The sound field device of claim 2, wherein the optimizer is configured to determine the one or more low-frequency filter elements u(k) as:
min.sub.u(k)H.sub.b(k)u(k)HRTF.sub.el(,k)P.sub.d.sup.2 subject to u(k).sup.2N.sub.1 and H.sub.j(k)u(k)N.sub.j, where N.sub.j=M.sub.1P.sub.dHRTF.sub.el(,k).sup.2/M.sub.j for j2, N.sub.1 is a predetermined parameter, H.sub.b(k) is an acoustic transfer function matrix from the array of loudspeakers to the one or more bright zone control points inside the bright zone, H.sub.j(k) is an acoustic transfer function matrix from the array of loudspeakers to one or more quiet zone control points inside at least one quiet zone, P.sub.d is a desired sound field for the one or more control points, M.sub.1 is a number of control points within the bright zone and M.sub.j is a number of control points within a j-th quiet zone, wherein j2.
4. The sound field device of claim 2, wherein the low-frequency filter estimator is configured to estimate the transfer function to the one or more control points by evaluating one or more of the following: one or more three-dimensional (3D) Green's functions with free-field assumption; and one or more measurements of a room impulse response.
5. The sound field device of claim 1, wherein the high-frequency filter estimator comprises: a loudspeaker selection unit configured to select one or more active loudspeakers such that locations of the one or more active loudspeakers overlap with a projection of the bright zone on the array of loudspeakers; and a loudspeaker weight assigning unit configured to assign one or more frequency-dependent weights to the one or more active loudspeakers.
6. The sound field device of claim 5, wherein the loudspeaker weight assigning unit is configured to assign weights of {square root over (N.sub.1/P)} HRTF.sub.el(,k) to the one or more active loudspeakers, wherein P is a number of active loudspeakers and N.sub.1 is a predetermined parameter.
7. The sound field device of claim 1, wherein a cutoff frequency between the one or more low-frequency filter elements and the one or more high-frequency filter elements is chosen as (Q1)c/4r, wherein Q is a number of loudspeakers in the array of loudspeakers, r is a radius of the bright zone, and c is a speed of sound.
8. The sound field device of claim 1, wherein the elevation cue estimator is configured to estimate the elevation cue independent of an azimuth angle of a source relative to the bright zone.
9. The sound field device of claim 1, wherein the elevation cue estimator is configured to compute the elevation cue according to:
10. An audio system, comprising: a detector configured to determine an elevation of a virtual sound source relative to a listener; a sound field device configured to determine a plurality of filter elements based on the determined elevation of the virtual sound source; a signal generator configured to generate a driving signal weighted with the determined plurality of filter elements; and an array of loudspeakers.
11. The audio system of claim 10, wherein the array of loudspeakers is arranged in a horizontal plane.
12. The audio system of claim 10, wherein: the plurality of filter elements comprise one or more low frequency filter elements and one or more high-frequency filter elements, the one or more low-frequency filter elements and the one or more high-frequency filter elements are for driving the array of loudspeakers to generate an elevated sound impression at a bright zone; the sound field device comprises: a low-frequency filter estimator configured to estimate, according to a first estimation technique, one or more low-frequency filter elements based on an estimated elevation cue of a head-related transfer function (HRTF) of at least one listener; and a high-frequency filter estimator configured to estimate, according to a second estimation technique, one or more high-frequency filter elements based on the estimated elevation cue, the first estimation technique being different from the second estimation technique.
13. The audio system of claim 12, wherein the high-frequency filter estimator comprises: a loudspeaker selection unit configured to select one or more active loudspeakers such that locations of the one or more active loudspeakers overlap with a projection of the bright zone on the array of loudspeakers; and a loudspeaker weight assigning unit configured to assign one or more frequency-dependent weights to the one or more active loudspeakers.
14. A method, comprising: estimating an elevation cue of a head-related transfer function (HRTF) of at least one listener; estimating, using a first estimation method, one or more low-frequency filter elements based on the elevation cue; and estimating, using a second estimation method that is different from the first estimation method, one or more high-frequency filter elements based on the elevation cue, the one or more low-frequency filter elements and the one or more high-frequency filter elements for driving an array of loudspeakers to generate an elevated sound impression at a bright zone, each of the low-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers and each of the high-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers.
15. The method of claim 14, wherein the method is performed for a plurality of source signals and a plurality of bright zones.
16. The method of claim 14, wherein estimating the one or more low-frequency filter elements comprises determining the one or more low-frequency filter elements by optimizing an error measure between a desired sound field at one or more control points of the bright zone, weighted by the elevation cue, and an estimate of a transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone.
17. A non-transitory computer-readable storage medium storing program code, the program code comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: estimating an elevation cue of a head-related transfer function (HRTF) of at least one listener; estimating, using a first estimation method, one or more low-frequency filter elements based on the elevation cue; and estimating, using a second estimation method that is different from the first estimation method, one or more high-frequency filter elements based on the elevation cue, the one or more low-frequency filter elements and the one or more high-frequency filter elements for driving an array of loudspeakers to generate an elevated sound impression at a bright zone, each of the low-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers and each of the high-frequency filter elements corresponds to a respective loudspeaker of the array of loudspeakers.
18. The non-transitory computer-readable storage medium of claim 17, wherein the operations are performed for a plurality of source signals and a plurality of bright zones.
19. The non-transitory computer-readable storage medium of claim 17, wherein estimating the one or more low-frequency filter elements comprises determining the one or more low-frequency filter elements by optimizing an error measure between a desired sound field at one or more control points of the bright zone, weighted by the elevation cue, and an estimate of a transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone.
20. The non-transitory computer-readable storage medium of claim 19, wherein determining an estimate of the transfer function that represents a channel from the array of loudspeakers to the one or more control points of the bright zone by evaluating one or more of the following: one or more three-dimensional (3D) Green's functions with free-field assumption; and one or more measurements of a room impulse response.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) To illustrate the technical features of embodiments of the present application more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present application, but modifications on these embodiments are possible without departing from the scope of the present application as defined in the claims.
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
(8)
(9) Elevation cue estimator 110, and low- and high-frequency filter estimators 120, 130 can be implemented in the same physical device, e.g., the same processor can be configured to act as elevation cue estimator 110, low-frequency filter estimator 120 and/or high-frequency filter estimator 130.
(10) A (first) estimation method of low-frequency filter estimator 120 is different from a (second) estimation method of high-frequency filter estimator 130. For example, the first and second method can be different in the sense that they use different computational techniques for determining the low- and high-frequency filter elements.
(11) Sound field device 100 can be configured to further comprise a signal generator (not shown in
(12)
(13) Detector 210, sound field device 100, and signal generator 220 can be part of one apparatus.
(14) System 200 can further comprise an amplifier (not shown in
(15) The array of loudspeakers 230 can be arranged in one horizontal plane. In other embodiments, the array of loudspeakers 230 can be arranged in different height levels. In certain embodiments, system 200 comprises a unit for determining an elevation level of the loudspeakers 230, such that the filter elements and thus the plurality of drive signals 222 can be computed with knowledge of the elevation level of each of the loudspeakers 230. To this end, the unit for determining the elevation level can comprise an input unit where a user can input information about the elevation level of the loudspeakers 230. In other embodiments, the unit for determining the elevation level can comprise a sensor for sensing an elevation level of the loudspeakers 230 without manual input from a user.
(16)
(17) Method 300 may comprise further steps (not shown in
(18)
(19) As illustrated in
(20) As illustrated in
(21)
(22)
(23) For the generation of the low-frequency filter elements, in step 532 the desired sound field P.sub.d and the transfer matrices H.sub.b and H.sub.j are computed. Subsequently, in step 534 a multi-constraint convex optimization is performed in order to determine the optimal low-frequency filter elements u.
(24) For frequencies with k2f/c (low-pass filtering), wherein k=2f/c, a joint-optimization with multi-constraint is formulated. A desired horizontal sound field in vector P.sub.d (dimension: M.sub.11) is defined for the control points within the bright zone. The desired sound field can be, for example, a plane wave function arriving from the speaker array or simply set to 1. The acoustic transfer function matrix from each loudspeaker to points inside the bright zone H.sub.b (M.sub.1Q), the acoustic transfer function matrix from each loudspeaker to points inside the quiet zones H.sub.j (M.sub.jQ) (j=2 . . . n). The acoustic transfer of the loudspeakers can be derived following the 3D Green's function with free-field assumption or based on additional microphone measurements of the room impulse responses. The loudspeaker filtering weights vector w (Q1). The acoustic transfer function can M.sub.1 represents the number of control points within the selected bright zone and M.sub.j is the number of control points within the j-th quiet zone.
(25) A multi-constraint optimization with the objective of minimizing the mean square error to the desired sound field with the consideration of HRTF elevation over the bright zone:
(26)
a defines the acceptable level of sound energy leakage into the quiet zone and can be customized by users. N.sub.1 specifies the constraint on the loudspeaker array effort.
(27) The low-frequency filter elements u and the high-frequency filter elements v are merged to obtain a complete set of filter elements w, indicated with reference number 545. The filter elements are applied to a signal in frequency domain and an Inverse Fourier Transform is applied in step 550. On the resulting signal 552, a convolution 560 with speaker impulse responses is applied, which yields the output.
(28) For the generation of the high-frequency filter elements (e.g., with wave numbers k>(Q1)/2r, where Q is the number of speakers and r is the radius of each selected zone) in step 542 a loudspeaker selection is performed, and in step 544 weights are assigned to the selected active loudspeakers. This results in high-frequency filter elements v.
(29) In the high-pass filter filtering, the reproduction accuracy may be undermined due to the limited number of employed loudspeakers, which may affect the desired listening experience, especially for the sensation of the elevation. Therefore, a different filter design strategy may be applied. At high frequencies, as the ratio of the size of the piston to the wavelength of the sound increases, the sound field radiated by the speaker becomes even narrower and side lobes appear.
(30) Therefore, suppression of sound leakage at high frequencies can be achieved by exploiting the native directivity of the loudspeakers. The activated loudspeaker array partition may be selected such that it overlaps with the projection of the bright zone on the speaker array. It will be assumed that the number of selected loudspeakers is P. The loudspeaker weights assigned to the activated loudspeakers are {square root over (N.sub.1/P)}HRTF.sub.el(,k) in order to satisfy the constraint of w.sup.2N.sub.1.
(31) After the derivation of the loudspeaker filtering gain in the frequency domain using a bin-by-bin approach, the output of the system, which is the finite impulse responses for the speaker array, can be obtained by performing an Inverse Fast Fourier Transform (IFFT). The derivation of the speaker impulse responses can be conducted offline (e.g., once for each car/conference room and its zone/loudspeaker set-up), if appropriate.
(32) To fulfill the multi zone settings, filters that create n sets of one bright and (n1) quiet zones setup over the selected regions are needed for n (n2) source signals (as shown in
(33) To achieve this, a dual-band rendering system aiming to accurately reproduce the desired 3D elevated sound with the consideration of HRTF over the selected bright zone is provided. More specifically, a joint-optimization system with multiple constraints is applied to the filter design to minimize the reproduction to the desired 3D sound field over multiple listening areas at low frequencies. In contrast, the sound separation is achieved by a selection process of active loudspeakers at high frequencies and the characteristics of HRTF elevation cues may be preserved over the selected regions.
(34) The HRTF elevation cues in
(35)
where .sub.s is the elevation angle of the physical sources to the plane where the listeners' ears are locate. Therefore, in certain embodiments, the loudspeaker array is not only limited to the horizontal plane but can also be placed at other height levels (e.g., placed at the ceiling of the room or in a car).
(36) The proposed dual-band rendering system in
(37)
(38) Advantages of certain embodiments of the application include: In addition to the horizontal multi zone sound rendering, a more immersive elevated sensation can be provided in any location inside the selected zones of interests; The joint-optimization formulation in the dual-band rendering system provides a more accurate reproduction of the desired sound field with the consideration of HRTF elevation over the selected zone, especially at low frequency range; The application is capable of rendering different elevated virtual sources for various zones simultaneously; No additional loudspeakers or changing the 2D loudspeaker setup are needed; Limited additional computational cost.
(39) The described sound field device and audio system can be applied in many scenarios, including, for example: Any sound reproduction system or surround sound system with 2D loudspeaker array (most commonly used in existing products). The elevation rendering in the application addresses the limitation due to 2D speaker setup and provides more immersive 3D virtual sound.
(40) In particular examples, the sound field device and the audio system can be applied in the following scenarios: a TV speaker system, a car entertaining system, a teleconference system, and/or a home cinema system,
where the personal listening environments for one or multiple listeners are desirable.
(41) The foregoing descriptions are only implementation manners of the present application; the protection of the scope of the present application is not limited to this. Any variations or replacements can be easily made through a person skilled in the art. Therefore, the protection scope of the present application should be subject to the protection scope of the attached claims.