System and method for augmenting an acoustic space
10812902 ยท 2020-10-20
Assignee
Inventors
- Jonathan S. Abel (Menlo Park, CA)
- Eoin F. Callery (Mountain View, CA, US)
- Elliot Kermit Canfield-Dafilou (Mountain View, CA, US)
Cpc classification
H04S7/305
ELECTRICITY
H04R2227/007
ELECTRICITY
H04R3/02
ELECTRICITY
International classification
H04R3/02
ELECTRICITY
H04S7/00
ELECTRICITY
Abstract
A method and system for real-time auralization is described in which room sounds are reverberated and presented over loudspeakers, thereby augmenting the acoustics of the space. Room microphones are used to capture room sound sources, with their outputs processed in a canceler to remove the synthetic reverberation also present in the room. Doing so gives precise control over the auralization while suppressing feedback. It also allows freedom of movement and creates a more natural acoustic environment for performers or participants in music, theater, gaming, home entertainment, and virtual reality applications. Canceler design methods are described, including techniques for handling varying loudspeaker-microphone transfer functions such as would be present in the context of a performance or installation.
Claims
1. A system for reducing feedback resulting from a sound produced by a speaker being captured by a microphone, the sound including auralization effects, the system comprising: an auralizer for producing the auralization effects; and a canceler, wherein the canceler includes a cancellation filter that is based on an impulse response between the microphone and the speaker, and wherein the impulse response is formed according to acoustics of a live acoustic space in which the microphone and the speaker are separately placed, and wherein the acoustics include at least an acoustic propagation delay between the speaker and the microphone.
2. The system of claim 1, wherein the cancellation filter is calibrated based on relative positions of the microphone and the speaker in the live acoustic space.
3. The system of claim 1, wherein the microphone is one of a plurality of microphones, and wherein the speaker is one of a plurality of speakers, and wherein the cancellation filter is based on impulse responses between each microphone-speaker pair of the plurality of microphones and the plurality of speakers.
4. The system of claim 1, wherein the auralization effects include artificial reverberation.
5. The system of claim 4, wherein the artificial reverberation is performed in accordance with a target acoustic space that is different from the live acoustic space.
6. The system of claim 1, wherein the microphone further captures live sound, the canceler being operative to reduce feedback caused by the acoustics of the live acoustic space before the live sound is processed by the auralizer and output to the speaker as the sound with the auralization effects.
7. The system of claim 1, wherein the auralizer and the canceler are implemented by a digital audio workstation.
8. A method for reducing feedback resulting from a sound produced by a speaker being captured by a microphone, the sound including auralization effects, the system comprising: capturing live sound by the microphone; and performing cancelation on the live sound using a cancellation filter that is based on an impulse response between the microphone and the speaker, the cancelation resulting in a live sound estimate, and wherein the impulse response is formed according to acoustics of a live acoustic space in which the microphone and the speaker are separately placed, and wherein the acoustics include at least an acoustic propagation delay between the speaker and the microphone.
9. The method of claim 8, further including adding the auralization effects to the live sound estimate and providing the live sound estimate with the added auralization effects to the speaker.
10. The method of claim 8, wherein the cancellation filter is calibrated based on relative positions of the microphone and the speaker in the live acoustic space.
11. The method of claim 8, wherein the microphone is one of a plurality of microphones, and wherein the speaker is one of a plurality of speakers, and wherein the cancellation filter is based on impulse responses between each microphone-speaker pair of the plurality of microphones and the plurality of speakers.
12. The method of claim 9, wherein adding the auralization effects includes performing artificial reverberation.
13. The method of claim 12, wherein the artificial reverberation is performed in accordance with a target acoustic space that is different from the live acoustic space.
14. A method for reducing feedback resulting from a sound produced by a speaker being captured by a microphone, the sound including auralization effects, the method comprising: generating a live sound without auralization effects from the speaker in a live acoustic space; capturing the live sound by the microphone; measuring an impulse response between the microphone and the speaker using the captured live sound; and using the measured impulse response to obtain characteristics of a cancellation filter wherein the cancellation filter is configured to reduce effects of at least acoustics of a live acoustic space in which the microphone and the speaker are separately placed, and wherein the acoustics include at least an acoustic propagation delay between the speaker and the microphone.
15. The method of claim 14, wherein the measured impulse response is a function of frequency and time.
16. The method of claim 14, wherein the characteristics of the cancellation filter further include a windowing function.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) These and other aspects and features of the present embodiments will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DETAILED DESCRIPTION
(16) The present embodiments will now be described in detail with reference to the drawings, which are provided as illustrative examples of the embodiments so as to enable those skilled in the art to practice the embodiments and alternatives apparent to those skilled in the art. Notably, the figures and examples below are not meant to limit the scope of the present embodiments to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Moreover, where certain elements of the present embodiments can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present embodiments will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the present embodiments. Embodiments described as being implemented in software should not be limited thereto, but can include embodiments implemented in hardware, or combinations of software and hardware, and vice-versa, as will be apparent to those skilled in the art, unless otherwise specified herein. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the present disclosure is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present embodiments encompass present and future known equivalents to the known components referred to herein by way of illustration.
(17) According to certain aspects, the present embodiments provide a system and method for real-time auralization that uses standard room microphones, loudspeakers, and inventive signal processing tools to synthesize virtual acoustics while canceling the feedback. The cancellation method described herein uses an adaptive noise cancellation approach (see, e.g., Widrow, B., et al., Adaptive Noise Cancelling: Principles and Applications, Proceedings of the IEEE, 63(12), pp. 1692-1716, 1975, hereinafter [32]) in which a primary signal is the sum of a desired signal and unwanted noise. In that approach, a reference signal, which is correlated with the unwanted noise, is used to estimate and subtract the unwanted noise from the primary signal. Related literature also includes echo cancellation and dereverberation (see, e.g., Emanuel Habets, Fifty years of reverberation reduction: From analog signal processing to machine learning, AES 60th Conference on DREAMS, 2016, hereinafter [12]; Patrick A Naylor and Nikolay D Gaubitch, Eds., Speech Dereverberation, Springer, 2010, hereinafter [13]; and Francis Rumsey, Reverberation . . . and how to remove it, Journal of the Acoustical Society of America, vol. 64, no. 4, pp. 262-6, April 2016, hereinafter [14]).
(18) In one embodiment, a loudspeaker and microphone are configured in a room having a sound source. Room sounds are reverberated according to the acoustics of a desired target space and presented over loudspeakers, thereby augmenting the acoustics of the room. The room microphone captures sound from the room sound sources as well as from the loudspeaker playing the simulated acoustics. Measurements of the impulse response between the loudspeaker and microphone are used to estimate and subtract the simulated acoustics from the microphone signal, thereby eliminating feedback. In another embodiment, impulse responses between a plurality of loudspeakers and microphones are used to cancel simulated acoustics from multiple loudspeakers for each microphone.
(19) In an additional embodiment, multiple impulse response measurements between a loudspeaker and microphone are made, and estimates of the impulse response standard deviation as a function of time and frequency band are formed, and used in designing the processing to cancel the synthesized acoustics from the microphone signals. In a further embodiment, the correlation between a loudspeaker and microphone signal is used to adaptively modify the cancellation processing.
(20)
(21) As shown, example system 100 includes a microphone 102 and speaker 104 that are both connected to an audio interface 106. Audio interface 106 includes an input 108 connected to microphone 102 and an output 110 connected to speaker 104. Audio interface 106 further includes a port 112 connected to computer 114 (e.g. desktop or notebook computer, pad or tablet computer, smart phone, etc.). It should be noted that other embodiments of system 100 can include additional or fewer components than shown in the example of
(22) Moreover, although shown separately for ease of illustration, it should be noted that certain components of system 100 can be implemented together. For example, computer 114 can comprise digital audio workstation software (e.g. implementing auralization and cancelation processing according to embodiments) and be configured with an audio interface such as 106 connected to microphone preamps (e.g. input 108) and microphones (e.g. microphone 102) and a set of powered loudspeakers (e.g. speaker 104). In these and other embodiments, certain components can also be integrated into existing speaker arrays, and can be implemented using inexpensive and readily available software. For example, in virtual, augmented, and mixed reality scenarios, the system allows users to dispense with headphones for more immersive virtual acoustic experiences. Other hardware and software, including special-purpose hardware and custom software, may also be designed and used in accordance with the principles of the present embodiments.
(23) In general operation according to aspects of embodiments, room sounds (e.g. a music performance, voices from a virtual reality game participant, etc.) are captured by microphone 102. The captured sounds (i.e. microphone signals) are provided via interface 106 to computer 114, which processes the signals in real time to perform artificial reverberation according to the acoustics of a desired target space (i.e. auralization). The processed sound signals are then presented via interface 106 over speaker 104, thereby augmenting the acoustics of the room and enriching the experience of performers, game players, etc. As should be apparent, the room microphone 102 will also capture sound from the speaker 104, which is playing the simulated acoustics. According to aspects of the present embodiments, and as will be described in more detail below, computer 114 further estimates and subtracts the simulated acoustics in real time from the microphone signal, thereby eliminating feedback.
(24)
l(t)=h(t)*d(t).(1)
(25) Many known auralization techniques can be used to implement auralizer 204, such as those using fast, low-latency convolution methods to save computation (e.g., William G. Gardner, Efficient convolution without latency, Journal of the Audio Engineering Society, vol. 43, pp. 2, 1993, hereinafter [16]; Guillermo Garcia, Optimal filter partition for efficient convolution with short in-put/output delay, in Proceedings of the 113th Audio Engineering Society Convention, 2002, hereinafter [17]; and Frank Wefers and Michael Vorlnder, Optimal filter partitions for real-time fir filtering using uniformly-partitioned fft-based convolution in the frequency-domain, in Proceedings of the 14th International Conference on Digital Audio Effects, 2011, pp. 155-61, hereinafter [18]). Another modal reverberator approach is disclosed in U.S. Pat. No. 9,805,704, the contents of which are incorporated herein by reference in their entirety. Although these known techniques can provide a form of impulse response h(t) used by auralizer 204, the difficulty is that the room source signals d(t) are not directly available: As described above, the room microphones also pick up the synthesized acoustics, and would cause feedback if the room microphone signal m(t) were reverberated without additional processing.
(26) According to certain aspects, the present embodiments auralize (e.g. using known techniques such as those mentioned above) an estimate of the room source signals d{circumflex over ()}(t), formed by subtracting from the microphone signal m(t) an estimate of the synthesized acoustics (e.g. the output of speaker 104). Assuming the geometry between the loudspeaker and microphone is unchanging, the actual dry signal d(t) is determined by:
d(t)=m(t)g(t)*l(t),(2)
where g(t) is the impulse response between the loudspeaker and microphone. Embodiments design an impulse response c(t), which approximates the loudspeaker-microphone response, and use it to form an estimate of the dry signal, d{circumflex over ()}(t), which is determined by:
d{circumflex over ()}(t)=m(t)c(t)*l(t).(3)
as shown in the signal flow diagram
(27) The question then becomes how to obtain the canceling filter c(t). A measurement of the impulse response g(t) provides an excellent starting point, though there are time-frequency regions over which the response is not well known due to measurement noise (typically affecting the low frequencies), or changes over time due to air circulation or performers, participants, or audience members moving about the space (typically affecting the latter part of the impulse response). In regions where the impulse response is not well known, it is preferred that the cancellation be reduced so as to not introduce additional reverberation.
(28) Here, the cancellation filter 202 impulse response c(t) is preferably chosen to minimize the expected energy in the difference between the actual and estimated room microphone loud-speaker signals. For simplicity of presentation and without loss of generality, assume for the moment that the loudspeaker-microphone impulse response is a unit pulse, i.e.
g(t)=g(t),(4)
and that the impulse response measurement g.sup.(t) is equal to the sum of the actual impulse response and zero-mean noise with variance g.sup.2. Consider a canceling filter c(t) which is a windowed version of the measured impulse response g.sup.(t),
c(t)=wg.sup.(t),(5)
(29) In this case, the measured impulse response is scaled according to a one-sample-long window w. The expected energy in the difference between the auralization and cancellation signals at time t is
E[(gl(t)wg.sup.l(t)).sup.2]=l.sup.2(t)[w.sup.2g.sup.2+g.sup.2(1w).sup.2].(6)
(30) Minimizing the residual energy over choices of the window w yields
c*(t)=w*g.sup.(t), w*=g.sup.2/(g.sup.2+g.sup.2)
(31) In other words, the optimum canceler response c*(t) is a Wiener-like weighting of the measured impulse response, w*g.sup.(t). When the loudspeaker-microphone impulse response magnitude is large compared with the impulse response measurement uncertainty, the window w will be near 1, and the cancellation filter will approximate the measured impulse response. By contrast, when the impulse response is poorly known, the window w will be smallroughly the measured impulse response signal-to-noise ratioand the cancellation filter will be attenuated compared to the measured impulse response. In this way, the optimal cancellation filter impulse response is seen to be the measured loudspeaker-microphone impulse response, scaled by a compressed signal-to-noise ratio (CSNR).
(32) Typically, the loudspeaker-microphone impulse response g(t) will last hundreds of milliseconds, and the window will preferably be a function of time t and frequency f that scales the measured impulse response. Denote by g.sup.(t, fb), b=1, 2, . . . N the measured impulse response g.sup.(t) split into a set of N frequency bands fb, for example using a filterbank, such that the sum of the band responses is the original measurement,
g.sup.(t)=Sum(g.sup.(t,fb)), b=1 to N.(8)
(33) In this case, the canceler response c*(t) is the sum of measured impulse response bands g.sup.(t, fb), scaled in each band by a corresponding window w*(t, fb). Expressed mathematically,
c*(t)=Sum(c*(t,fb)), b=1 to N,(9)
where
c*(t,fb)=w*(t,fb)g.sup.(t,fb),(10)
w*(t,fb)=g.sup.2(t,fb)/(g.sup.2(t,fb)+g.sup.2(t,fb))(11)
(34) Embodiments use the measured impulse g.sup.(t, fb) as a stand-in for the actual impulse g(t, fb) in computing the window w(t, fb). Alternatively, repeated measurements of the impulse response g(t, fb) could be made, with the measurement mean used for g(t, fb), and the variation in the impulse response measurements as a function of time and frequency used to form g.sup.2(t, fb). Embodiments also perform smoothing of g.sup.2(t, fb) over time and frequency in computing w(t, fb) so that the window is a smoothly changing function of time and frequency.
(35) It should be noted that the principles described above can be extended to cases other than a single microphone-loudspeaker pair, as shown in
l(t)=H(t)*m(t),(12)
d{circumflex over ()}(t)=m(t)C(t)*l(t),(13)
where H(t) is the matrix of auralizer filters of 304 and C(t) the matrix of canceling filters of 302. As in the single speaker-single microphone case, the canceling filter matrix is the matrix of measured impulse responses, each windowed according to its respective CSNR, which may be a function of both time and frequency.
(36) Moreover, a conditioning processor 308, denoted by Q, can be inserted between the microphones and auralizers,
l(t)=H(t)*Q(m(t)),(14)
d{circumflex over ()}(t)=Q(m(t))C(t)*l(t),(15)
as seen in
(37) The signal flows of
(38) As shown in
(39) According to certain aspects, a system such as described above in connection with
(40)
(41)
(42) As shown in
(43) In 606, a single or multiple microphone-speaker pair impulse response measurement(s) are made using a sine sweep or other test signal, preferably covering the entire audio band, fed to the speaker(s). In embodiments, this can include dozens of measurements of the empty space or the space with audience member stand-ins to understand the variation over time of the impulse responses between each pair of the microphones and speakers.
(44) In 608, the impulse response measurements are used to derive the cancellation filter as a function of time t and frequency fb. For example, an average of the measured impulse responses can be used to derive g.sup.(t, fb), and the standard deviation of the measured impulse responses can be used to derive g.sup.2(t, fb). The optimal window w*(t, fb) may then be derived according to (11) described above. Finally, to find the cancellation filter c(t, fb), the measured impulse response g.sup.(t, fb) is shifted and scaled according to the amplitude and arrival time of the c(t)=(t) pulse in the measurement system. For example,
(45) An example cancellation impulse response c(t) obtained using the methodology described above is shown in
(46) After obtaining the cancellation filter as described above, the system is configured for run mode in 610, for example in accordance with the signal flows of
(47) It is useful to anticipate the effectiveness of the virtual acoustics cancellation in any given microphone. Substituting the optimal windowing (7) into the expression for the canceler residual energy (6), the virtual acoustics energy in the cancelled microphone signal is expected to be scaled by a factor of
=g.sup.2/(g.sup.2+g.sup.2),(16)
compared to that in the original microphone signal. Note that the reverberation-to-signal energy ratio is improved in proportion to the measurement variance for accurately measured signals, i.e. g.sup.2<<g.sup.2. By contrast, when the impulse response is inaccurately measured, the reverberation-to-signal energy ratio is nearly unchanged, 1.
(48) As an example of the performance of the present embodiments, several versions of the system of
(49) In a first test shown in
(50)
(51) To better understand the practical performance of the system, the present applicants made repeated measurements of the loudspeaker-microphone response at the CCRMA Stage in unoccupied and occupied conditions.
(52)
(53)
(54) An example of the ability of a system according to embodiments to suppress feedback resulting from creating a reverberant synthetic acoustic environment is described with reference to
(55) Although the present embodiments have been particularly described with reference to preferred examples thereof, it should be readily apparent to those of ordinary skill in the art that changes and modifications in the form and details may be made without departing from the spirit and scope of the present disclosure. It is intended that the appended claims encompass such changes and modifications.