Apparatus and method for driving an array of loudspeakers with drive signals
10375503 · 2019-08-06
Assignee
Inventors
- Michael Buerger (Erlangen, DE)
- Heinrich Löllmann (Erlangen, DE)
- Walter KELLERMANN (Erlangen, DE)
- Peter Grosche (Munich, DE)
- Yue Lang (Beijing, CN)
Cpc classification
H04S2420/01
ELECTRICITY
H04S2420/13
ELECTRICITY
H04S2400/11
ELECTRICITY
H04S7/30
ELECTRICITY
International classification
Abstract
A wave field synthesis apparatus for driving an array of loudspeakers with drive signals, the apparatus includes a sound field synthesizer for generating sound field drive signals for causing the array of loudspeakers to generate one or more sound fields at one or more audio zones, a binaural renderer for generating binaural drive signals for causing the array of loud-speakers to generate specified sound pressures at at least two positions, wherein the at least two positions are determined based on a detected position and/or orientation of a listener, and a decision unit for deciding whether to generate the drive signals using the sound field synthesizer or using the binaural renderer.
Claims
1. A wave field synthesis apparatus for driving an array of loudspeakers with drive signals, the apparatus comprising: a sound field synthesizer configured to generate sound field drive signals for causing the array of loudspeakers to generate one or more sound fields at one or more audio zones, a binaural renderer configured to generate binaural drive signals for causing the array of loudspeakers to generate specified sound pressures in at least two positions, wherein the at least two positions are determined based on at least one of a detected position or orientation of a listener, and a decision device configured to decide whether to generate the drive signals using the sound field synthesizer or using the binaural renderer based on a virtual position of a virtual sound source at one or more locations of the one or more audio zones; wherein when the one or more audio zones comprises more than one audio zone, the decision device is configured to decide to generate the drive signals for a selected audio zone of the more than one audio zone using the sound field synthesizer when an angular direction from the selected audio zone to a virtual source of one of the one or more sound fields deviates by more than a predefined angle from one or more angular directions from the selected audio zone to one or more remaining audio zones of the more than one audio zone.
2. The apparatus of claim 1, wherein the decision device is configured to decide further based on defined positions of the array of loudspeakers, at least one of a virtual orientation and a virtual extent of a virtual sound source, extent of the one or more audio zones, and at least one of the detected position of a listener or the detected orientation of a listener.
3. The apparatus of claim 1, wherein the decision device is configured to decide to generate the drive signals for a selected audio zone of the one or more audio zones using the sound field synthesizer when a sufficient number of loudspeakers of the array of loudspeakers are located in a virtual tube around a virtual line between a listener position and a virtual position of a virtual source.
4. The apparatus of claim 1, wherein the angular directions are determined based on centers of the selected audio zone and the one or more remaining audio zones.
5. The apparatus of claim 1, wherein the one or more audio zones comprise a dark zone that is substantially circular, and a bright zone that is substantially circular, wherein the decision device is configured to decide to generate the drive signals using the sound field synthesizer when a following condition is met:
6. The apparatus of claim 1, further comprising a splitter for separating a source signal into one or more split signals based on a property of the source signal, wherein the decision device is configured to decide for each of the split signals whether to generate corresponding drive signals using the sound field synthesizer or using the binaural renderer.
7. The apparatus of claim 6, wherein the decision device is configured to set one or more parameters of the splitter.
8. The apparatus of claim 6, wherein the splitter is a filter bank for separating the source signal into one or more bandwidth-limited signals.
9. The apparatus of claim 8, wherein the filter bank is configured to separate the source signal into two or more bandwidth-limited signals that partially overlap in frequency domain.
10. The apparatus of claim 1, wherein the binaural renderer is configured to generate the binaural drive signals based on one or more head-related transfer functions, wherein the one or more head-related transfer functions are retrieved from a database of head-related transfer functions.
11. A method for driving an array of loudspeakers with drive signals to generate one or more local wave fields at one or more audio zones, the method comprising: detecting at least one of a position or an orientation of a listener; deciding whether to generate the drive signals using a sound field synthesizer or whether to generate the drive signals using a binaural renderer based on a virtual position of a virtual sound source at one or more locations of the one or more audio zones, wherein when the one or more audio zones comprises more than one audio zone, a decision device is configured to decide to generate the drive signals for a selected audio zone of the more than one audio zone using the sound field synthesizer when an angular direction from the selected audio zone to a virtual source of one of the one or more sound fields deviates by more than a predefined angle from one or more angular directions from the selected audio zone to one or more remaining audio zones of the more than one audio zone, and implementing one of the following: generating sound field drive signals for causing the array of loudspeakers to generate one or more sound fields at one or more audio zones, and generating binaural drive signals for causing the array of loudspeakers to generate specified sound pressures in at least two positions, wherein the at least two positions are determined based on at least one of the detected position or the detected orientation of the listener.
12. The method of claim 11, wherein the loudspeakers are located in a car.
13. The method of claim 12, wherein detecting at least one of the position or the orientation of the listener comprises: detecting which seat of the car is occupied by the listener.
14. A non-transitory computer-readable storage medium storing program code, the program code comprising processor-readable instructions which when executed by a processor cause the processor to implement operations for driving an array of loudspeakers with drive signals to generate one or more local wave fields at one or more audio zones, the operations including: detecting at least one of a position or an orientation of a listener; deciding whether to generate the drive signals using a sound field synthesizer or whether to generate the drive signals using a binaural renderer based on a virtual position of a virtual sound source at one or more locations of the one or more audio zones, wherein when the one or more audio zones comprises more than one audio zone, a decision device is configured to decide to generate the drive signals for a selected audio zone of the more than one audio zone using the sound field synthesizer when an angular direction from the selected audio zone to a virtual source of one of the one or more sound fields deviates by more than a predefined angle from one or more angular directions from the selected audio zone to one or more remaining audio zones of the more than one audio zone; and implementing one of the following: generating sound field drive signals for causing the array of loudspeakers to generate one or more sound fields at one or more audio zones, and generating binaural drive signals for causing the array of loudspeakers to generate specified sound pressures in at least two positions, wherein the at least two positions are determined based on at least one of the detected position or the detected orientation of the listener.
15. The non-transitory computer-readable storage medium of claim 14, wherein the loudspeakers are located in a car, wherein the operation of detecting at least one of the position or the orientation of the listener comprises: detecting which seat of the car is occupied by the listener.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) To illustrate the technical features of embodiments of the present invention more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present invention, but modifications on these embodiments are possible without departing from the scope of the present invention as defined in the claims.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(12)
(13)
(14) Two listeners 222, 232 are surrounded by the array of loudspeakers 210. The first listener 222 is located in a first audio zone 220 and the second listener 232 is located in a second audio zone 230.
(15) Angles .sub.S1, .sub.12, .sub.22, and .sub.S2 are defined relative to the x-axis. .sub.S1 and .sub.S2 indicate the angles of the directions 240, 250 of sound waves 242, 252 from a first and second virtual source (not shown in
(16)
(17)
(18) In step S22, it is determined whether the array of loudspeakers is unsuited for sound field synthesis (SFS). For example, if no or only an insufficient number of loudspeakers are placed in the angular direction in which virtual sources should be synthesized (from which sound waves should originate), SFS is not reasonable. Then, it is decided that binaural rendering (BR) drive signals should be generated in step S30 as a fallback solution for the entire frequency range.
(19) In step S24, it is determined whether the position of the virtual sound source is too close to any of the dark zones: If the angular direction .sub.S.sub.
(20) Unless in steps S22 and S24 it is decided that SFS is principally not feasible, SFS and BR are used simultaneously. In step S26, a filter bank is used to separate the source signal into two signals. Below a certain frequency co min (e.g., 200 Hz) and above a maximum frequency max (e.g., the spatial aliasing frequency
(21)
of the loudspeaker array, where c and d denote the speed of sound and the loudspeaker spacing, respectively), BR is used. In the remaining frequency range, SFS is utilized in order to obtain a large robustness with respect to head movements and rotations. The transition between SFS and BR is smooth, i.e., there is no abrupt change along the frequency axis, but fading is applied.
(22)
(23) If the source angle .sub.s does not lie in the interval [.sub.left, .sub.right] or if the loudspeaker arrangement is sparse (e.g., if the loudspeaker spacing d exceeds 15 cm-20 cm), the decision unit determines that SFS is not feasible.
(24)
(25)
(26)
(27)
For angle , this borderline case corresponds to
(28)
A more flexible decision rule, where an addition parameter 1 is introduced, results in a larger angle |.sub.max| and, thus, in a smaller angle .sub.min. The corresponding more flexible rule is given by
(29)
where the argument of arccos is upper bound to one.
(30) As described above, the proposed system can go beyond a straightforward approach, where a possible combination of BR and SFS merely depends on the frequency. Here, also the number and/or positions of the loudspeakers, the positions and/or extents of the virtual sources, and the local listening areas are taken into account, which are crucial parameters determining whether a certain reproduction approach is feasible or not.
(31)
(32) The spectrum S(, t) and the position vector x.sub.src are provided to a decision unit 830. The decision unit 830 comprises a filter bank 832 and a decision diagram unit 834, which is configured to define the bands (e.g., the cut-off frequencies) that are used by the filter bank 832.
(33) Based on the above-described decision rules, the filter bank 832 separates the source spectrum S(, t) into a first-band spectrum S.sub.SFS(, t) and a second-band spectrum S.sub.BR(, t), which are to be reproduced by sound field synthesis and binaural reproduction, respectively.
(34) The second-band spectrum S.sub.BR(, t) and the position vector x.sub.src of the virtual source are provided as inputs to a binaural renderer 820. Furthermore, a time-dependent head position x.sub.head(t) and a time-dependent head orientation .sub.head(t) are provided to the binaural renderer 820. The binaural renderer 820 comprises a synthesis unit 822 for generating binaural signals s.sub.binaural(, t) based on the position x.sub.src of the virtual source as well as the current head position x.sub.head(t) and a current orientation .sub.head(t) of the listener. To this end, the synthesis unit 822 uses Head-Related Transfer Functions (HRTFs) which are either modelled in the synthesis unit 822 or obtained from an HRTF measurement database (not shown in
(35) The wave field synthesis apparatus 800 comprises a head position and orientation detection unit 840 that is configured to detect a head position and orientation of a listener in image frames that are acquired by a camera 842. Furthermore, the wave field synthesis apparatus comprises an object detection unit 844 that also obtains image frames from the camera 842. The object detection unit 844 can e.g. detect the positions x.sub.ldspk of the loudspeakers 210 and provide this information to one or more units of the wave field synthesis apparatus 800, in particular the decision diagram unit 834.
(36)
(37) To summarize, an apparatus and a method for driving an array of loudspeakers with drive signals are presented. Embodiments of the invention combine the advantages of sound field synthesis and binaural rendering. For example, rendering can be maintained even in cases where local sound field synthesis is not feasible and/or not reasonable by utilizing less robust binaural rendering. The robustness of binaural rendering can be increased by utilizing more robust sound field synthesis in mid-frequency ranges.
(38) Embodiments of the present invention allow more flexibility for placing the loudspeakers, require fewer loudspeakers to achieve the same rendering quality, are less complex, more robust, require less hardware and improve the frequency range.
(39) In this invention, binaural rendering and sound field synthesis can be combined such that the benefits of both approaches can be exploited. That is, for scenarios and frequency ranges, where sound field synthesis is not reasonable, binaural rendering can be utilized as a fallback solution. If sound field synthesis is feasible in certain frequencies, it supports binaural rendering and thereby increases the robustness of the system with respect to head movements.
(40) The invention has been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word comprising does not exclude other elements or steps and the indefinite article a or an does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in usually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
(41) Embodiments of the invention may be implemented in a computer program for running on a computer system, at least including code portions for performing steps of a method according to the invention when run on a programmable apparatus, such as a computer system or enabling a programmable apparatus to perform functions of a device or system according to the invention.
(42) A computer program is a list of instructions such as a particular application program and/or an operating system. The computer program may for instance include one or more of: a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
(43) The computer program may be stored internally on computer readable storage medium or transmitted to the computer system via a computer readable transmission medium. All or some of the computer program may be provided on transitory or non-transitory computer readable media permanently, removably or remotely coupled to an information processing system. The computer readable media may include, for example and without limitation, any number of the following: magnetic storage media including disk and tape storage media; optical storage media such as compact disk media (e.g., CD-ROM, CD-R, etc.) and digital video disk storage media; non-volatile memory storage media including semiconductor-based memory units such as FLASH memory, EEPROM, EPROM, ROM; ferromagnetic digital memories; MRAM; volatile storage media including registers, buffers or caches, main memory, RAM, etc.; and data transmission media including computer networks, point-to-point telecommunication equipment, and carrier wave transmission media, just to name a few.
(44) A computer process typically includes an executing (running) program or portion of a program, current program values and state information, and the resources used by the operating system to manage the execution of the process. An operating system (OS) is the software that manages the sharing of the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system.
(45) The computer system may for instance include at least one processing unit, associated memory and a number of input/output (I/O) devices. When executing the computer program, the computer system processes information according to the computer program and produces resultant output information via I/O devices.
(46) The connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.
(47) Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. For example, the wave field synthesis apparatus 800 may include a virtual source unit 802.
(48) Furthermore, those skilled in the art will recognize that boundaries between the above de-scribed operations merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.
(49) Also for example, the examples, or portions thereof, may implemented as soft or code representations of physical circuitry or of logical representations convertible into physical circuitry, such as in a hardware description language of any appropriate type.
(50) Also, the invention is not limited to physical devices or units implemented in nonprogrammable hardware but can also be applied in programmable devices or units able to perform the desired device functions by operating in accordance with suitable program code, such as mainframes, minicomputers, servers, workstations, personal computers, notepads, personal digital assistants, electronic games, automotive and other embedded systems, cell phones and various other wireless devices, commonly denoted in this application as computer systems.