POLE-ZERO BLOCKING MATRIX FOR LOW-DELAY FAR-FIELD BEAMFORMING
20210233509 · 2021-07-29
Inventors
Cpc classification
G10K11/17881
PHYSICS
H04R2430/25
ELECTRICITY
G10L2021/02165
PHYSICS
H04S7/301
ELECTRICITY
International classification
G10K11/178
PHYSICS
Abstract
A system performs pole-zero or IIR modeling and estimation of an inter-microphone transfer function between first and second microphones that output respective first and second microphone signals. The system includes a first adaptive FIR filter to which the first microphone signal is provided, a delay element that delays the second microphone signal by a predetermined delay amount, and a second adaptive FIR filter to which the delayed second microphone signal is provided. A first coefficient of the second adaptive FIR filter is constrained to a fixed non-zero value. The filters are jointly adapted to minimize an error signal that is a difference of the two filters outputs. The delay is small: approximately the acoustic propagation delay between the two microphones and is not determined by the environmental reverberation characteristics. The error signal may serve as a noise reference in a noise canceller, for implementing far-field beamforming with low delay.
Claims
1. A system for pole-zero or infinite impulse response (IIR) modeling and estimation of an adaptive blocking matrix (ABM) inter-microphone transfer function between first and second microphones that output respective first and second microphone signals, comprising: a first adaptive finite impulse response (FIR) filter to which the first microphone signal is provided as input; a delay element that delays the second microphone signal by a predetermined delay amount; a second adaptive FIR filter to which the delayed second microphone signal is provided as input; wherein a linear constraint is applied to coefficients of the first and second adaptive FIR filters; and wherein the first and second adaptive FIR filters are jointly adapted to minimize an error signal that is a difference of outputs of the first and second adaptive FIR filters.
2. The system of claim 1, wherein, depending on the linear constraint applied to the coefficients of the first and second adaptive FIR filters, the predetermined delay amount may be zero.
3. The system of claim 1, wherein the predetermined delay is not determined by and is substantially lower than what is dictated by reverberation characteristics of sound received by the first and second microphones in reverberant conditions.
4. The system of claim 1, wherein the ABM inter-microphone transfer function is modeled and estimated by the system with low delay by using a pole-zero representation or IIR filter to exploit a noncausal relationship inherent in the first and second microphone signals.
5. The system of claim 1, wherein applying the linear constraint to coefficients of the first and second adaptive FIR filters comprises: constraining a first coefficient of the second adaptive FIR filter to a fixed non-zero value.
6. The system of claim 5, wherein to constrain the first coefficient of the second adaptive FIR filter to the fixed non-zero value, the system provides a 1-sample-delayed version of the delayed second microphone signal as input to a third adaptive FIR filter whose output is summed with the delayed second microphone signal gained by the fixed non-zero value to generate the output of the second adaptive FIR filter from which the first adaptive FIR filter output is subtracted to generate the error signal.
7. The system of claim 5, wherein to constrain the first coefficient of the second adaptive FIR filter to the fixed non-zero value, the system provides a 1-sample-delayed version of the delayed second microphone signal as input to a third adaptive FIR filter whose output is subtracted from the first adaptive FIR filter output to generate an estimate of the delayed second microphone signal gained by the fixed non-zero value, which is subtracted from the delayed second microphone signal gained by the fixed non-zero value to generate the error signal.
8. The system of claim 5, wherein the fixed non-zero value is unity.
9. The system of claim 5, wherein the predetermined delay is approximately an acoustic propagation delay between the first and second microphones.
10. The system of claim 1, wherein a ratio of transfer functions of the first and second adaptive FIR filters approximates the ABM inter-microphone transfer function.
11. The system of claim 1, wherein the error signal is used as a noise reference in an adaptive noise canceller of a beamformer.
12. The system of claim 11, wherein a pole-zero representation or IIR filter is used to model and estimate the adaptive noise canceller of the beamformer.
13. The system of claim 1, further comprising: one or more additional microphones that output respective additional microphone signals; wherein for each additional microphone of the one or more additional microphones, the system further comprises: a first adaptive FIR filter to which the first microphone signal is provided as input; a delay element that delays the respective additional microphone signal by a predetermined delay amount; a second adaptive FIR filter to which the delayed respective additional microphone signal is provided as input; wherein a linear constraint is applied to coefficients of the first and second adaptive FIR filters; wherein the first and second adaptive FIR filters are jointly adapted to minimize an additional error signal that is a difference of outputs of the first and second adaptive FIR filters; and wherein the additional error signals are included as noise references in an adaptive noise canceller of a beamformer.
14. A method for pole-zero or infinite impulse response (IIR) modeling and estimation of an adaptive blocking matrix (ABM) inter-microphone transfer function between first and second microphones that output respective first and second microphone signals, comprising: providing the first microphone signal as input to a first adaptive finite impulse response (FIR) filter; delaying the second microphone signal by a predetermined delay amount; providing the delayed second microphone signal as input to a second adaptive FIR filter; applying a linear constraint to coefficients of the first and second adaptive FIR filters; and jointly adapting the first and second adaptive FIR filters to minimize an error signal that is a difference of outputs of the first and second adaptive FIR filters.
15. The method of claim 14, wherein, depending on the linear constraint applied to the coefficients of the first and second adaptive FIR filters, the predetermined delay amount may be zero.
16. The method of claim 14, wherein the predetermined delay is not determined by and is substantially lower than what is dictated by reverberation characteristics of sound received by the first and second microphones in reverberant conditions.
17. The method of claim 14, wherein the ABM inter-microphone transfer function is modeled and estimated by the method with low delay by using a pole-zero representation or IIR filter to exploit a noncausal relationship inherent in the first and second microphone signals.
18. The method of claim 14, wherein said applying the linear constraint to coefficients of the first and second adaptive FIR filters comprises: constraining a first coefficient of the second adaptive FIR filter to a fixed non-zero value.
19. The method of claim 18, wherein to constrain the first coefficient of the second adaptive FIR filter to the fixed non-zero value, the method provides a 1-sample-delayed version of the delayed second microphone signal as input to a third adaptive FIR filter whose output is summed with the delayed second microphone signal gained by the fixed non-zero value to generate the output of the second adaptive FIR filter from which the first adaptive FIR filter output is subtracted to generate the error signal.
20. The method of claim 18, wherein to constrain the first coefficient of the second adaptive FIR filter to the fixed non-zero value, the method provides a 1-sample-delayed version of the delayed second microphone signal as input to a third adaptive FIR filter whose output is subtracted from the first adaptive FIR filter output to generate an estimate of the delayed second microphone signal gained by the fixed non-zero value, which is subtracted from the delayed second microphone signal gained by the fixed non-zero value to generate the error signal.
21. The method of claim 18, wherein the fixed non-zero value is unity.
22. The method of claim 18, wherein the predetermined delay is approximately an acoustic propagation delay between the first and second microphones.
23. The method of claim 14, wherein a ratio of transfer functions of the first and second adaptive FIR filters approximates the ABM inter-microphone transfer function.
24. The method of claim 14, further comprising: using the error signal as a noise reference in an adaptive noise canceller of a beamformer.
25. The method of claim 24, wherein a pole-zero representation or IIR filter is used to model and estimate the adaptive noise canceller of the beamformer.
26. The method of claim 14, further comprising: for each additional microphone of one or more additional microphones that output respective additional microphone signals, the method further comprises: providing the first microphone signal to a first adaptive FIR filter; delaying the respective additional microphone signal by a predetermined delay amount; providing the delayed respective additional microphone signal as input to a second adaptive FIR filter; applying a linear constraint to coefficients of the first and second adaptive FIR filters; jointly adapting the first and second adaptive FIR filters to minimize an additional error signal that is a difference of outputs of the first and second adaptive FIR filters; and including the additional error signals as noise references in an adaptive noise canceller of a beamformer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
DETAILED DESCRIPTION
[0015]
[0016] In the GSC beamformer system 100, the function of the ABM is to block the talker's speech in the secondary microphone signals X.sub.2(z), X.sub.3(z), and X.sub.4(z), and generate noise reference signals, with Z-transforms denoted by E.sub.1(z), E.sub.2(z), and E.sub.3(z), for the SLC. The ABM comprises three adaptive FIR filters, with Z-transforms denoted by H.sub.12(z), H.sub.13(z), and H.sub.14(z), that model and estimate the inter-microphone transfer functions H.sub.12(z), H.sub.13(z), and H.sub.14(z), respectively, for the talker. The filters H.sub.12(z), H.sub.13(z), and H.sub.14(z) receive the primary microphone signal X.sub.1(z) as input. A first delay element, denoted by z.sup.−D.sup.
[0017] In the GSC beamformer system 100, the function of the SLC is to generate a noise-reduced beamformer output signal Y(z) by using the noise reference signals E.sub.1(z), E.sub.2(z), and E.sub.3(z) provided by the ABM to cancel noise from the primary microphone signal X.sub.1(z) while preserving the talker's speech. The SLC is comprised of three adaptive FIR filters, with Z-transforms denoted by Ĝ.sub.1(z), Ĝ.sub.2(z), and Ĝ.sub.3(z), which receive respective noise reference signals E.sub.1(z), E.sub.2(z), and E.sub.3(z) as input. A fourth summing node sums the outputs of the adaptive filters Ĝ.sub.1i(z), Ĝ.sub.2(z), and Ĝ.sub.3(z). A fourth delay element, denoted by z′2, delays the primary microphone signal X.sub.1(z) by an amount D.sub.2. A fifth summing node subtracts the output of the fourth summing node from the delayed version of the primary microphone signal X.sub.1(z) to generate the beamformer output signal Y(z), that is minimized by jointly adapting the filters Ĝ.sub.1(z), Ĝ.sub.2(z), and Ĝ.sub.3(z). The adaptation of the SLC filters is controlled by control logic (not shown), which adapts the SLC filters only when noise is determined to be present. By minimizing the output signal Y(z) during noise activity, the SLC estimates the spatial statistics (inter-microphone correlations) of the noise and reduces the noise at the output of the beamformer.
[0018] In the GSC beamformer system 100, each of the ABM filters Ĥ.sub.1(k+1)(z), k=1, 2, 3, models the corresponding inter-microphone transfer function H.sub.1(k+1)(z) between the primary microphone (Mic 1) and its respective secondary microphone (Mic 2, 3, or 4) for the talker. It is important that the inter-microphone transfer function is modeled accurately, so that the ABM can effectively block the talker's speech from the noise reference and the SLC only cancels the noise but none of the talker's speech at the primary microphone. In the conventional GSC beamformer system 100, each of the ABM filters H.sub.1(k+1)(z) is implemented using an all-zero representation or FIR filter. As explained in more detail below, for stability the inter-microphone transfer function must be modeled as a noncausal system. The FIR filter implementation models and estimates the noncausal impulse response of the inter-microphone transfer function in a stable manner by introducing a delay D.sub.1 in the secondary microphone signal, as shown in
[0019]
[0020] Although
[0021] In
X.sub.1(z)=S(z)H.sub.1(z), (1a)
X.sub.2(z)=S(z)H.sub.2(z). (1b)
Since the inter-microphone transfer function H.sub.12(z) may be viewed as a system that receives the primary microphone signal as input and outputs the secondary microphone signal, the following mathematical relationship as shown in equation (2) results:
Thus, the inter-microphone transfer function H.sub.12(z) is the ratio of the two source-to-microphone transfer functions H.sub.2(z) and H.sub.1(z). Providing the primary microphone signal X.sub.1(z) as input to the inter-microphone transfer function H.sub.12(z) yields the secondary microphone signal X.sub.2(z), as expressed in equation (3), again taking into account only the talker's speech and ignoring noise in the microphone signals:
[0022] Sound propagates at a finite speed and an utterance from the talker (and its reflections) can arrive at the microphones only after being spoken by the talker. Thus, the source-to-microphone transfer functions H.sub.1(z) and H.sub.2(z) are modeled by causal systems. However, as is well known (see S. T. Neely and J. B. Allen, “Invertibility of a room impulse response,” The Journal of the Acoustical Society of America, vol. 66, no. 1, pp. 165-169, July 1979), the source-to-microphone transfer functions are, in general, non-minimum-phase. Thus, for stability, the inter-microphone transfer function H.sub.12(z), a system that is the ratio of the source-to-microphone transfer functions H.sub.2(z) and H.sub.1(z), and that receives the primary microphone signal as input and outputs the secondary microphone signal, must be modeled as a noncausal system. In other words, the inter-microphone impulse response h.sub.12 (n) needed to predict the secondary microphone signal from the primary microphone signal is noncausal, consisting of a causal part (the right-hand side representing dependence on past primary microphone signal values) and an anti-causal part (the left-hand side representing dependence on future primary microphone signal values), as shown by the synthetic example in the graph of
[0023] Sound loses energy as it propagates in an environment. In reverberant environments, it may take hundreds of milliseconds for the sound energy to decay to a negligible level. This energy loss is manifested in slow decay of RIR coefficients (long impulse responses). Typically, the source-to-microphone transfer functions H.sub.1(z) and H.sub.2(z) are modeled using all-zero representations or FIR filters with a sufficient number of filter coefficients. All-pole and pole-zero representations or IIR filters have also been applied for modeling the causal source-to-microphone transfer functions (see, for example, Y. Haneda, S. Makino, and Y. Kaneda, “Common Acoustical Pole and Zero Modeling of Room Transfer Functions,” IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 320-328, April 1994). However, the inter-microphone transfer function H.sub.12(z) is inherently a rational or pole-zero transfer function because it is the ratio of the two source-to-microphone transfer functions H.sub.2(z) and H.sub.1(z). The zeros of the inter-microphone transfer function H.sub.12(z) are the roots of the numerator polynomial H.sub.2(z), and the poles are the roots of the denominator polynomial H.sub.1(z). This underlying pole-zero structure of the inter-microphone transfer function suggests that a pole-zero representation may be a more suitable model than the conventional all-zero representation. A pole-zero representation realizes an IIR filter that may need fewer number of numerator and denominator filter coefficients to achieve a certain modeling error as compared to an all-zero representation or FIR filter. The pole-zero representation or IIR filter based model of the noncausal inter-microphone transfer function described in the present disclosure may be especially advantageous in applications where high efficiency and low delay are critical.
[0024] In a conventional GSC beamformer system, such as the system 100 shown in
[0025]
[0026] As described earlier, there exists an underlying pole-zero structure in the inter-microphone transfer function between the primary microphone Mic 1 and the secondary microphone Mic k+1, denoted by H.sub.1(k+1)(z), which is a ratio of the source-to-microphone transfer function H.sub.k+1(z) between the talker and the secondary microphone and the source-to-microphone transfer function H.sub.1(z) between the talker and the primary microphone. Each adaptive FIR filter pair Â.sub.k(z) and {circumflex over (B)}.sub.k(z) of the IIR ABM embodiment of
[0027] The conventional FIR ABM based beamformer system 100 shown in
[0028]
E(z)=X.sub.2(z)z.sup.−D.sup.
[0029] In the IIR ABM, due to minimization of the error E(z) defined in equation (4), the adaptive FIR filters Â(z) and {circumflex over (B)}(z) are configured such that, modulo the delay D.sub.1, the ratio of {circumflex over (B)}(z) and Â(z) estimates the ratio of the two source-to-microphone transfer functions H.sub.2(z) and H.sub.1(z), i.e., the inter-microphone transfer function H.sub.12(z), as expressed in equation (5):
The adaptive FIR filters (z) and Â(z) model the numerator and denominator, respectively, of the inter-microphone transfer function H.sub.12(z).
[0030] The estimated pole-zero or IIR ABM filter is stable, because the inter-microphone transfer function H.sub.12(z) is modeled as a noncausal system. Even though the source-to-microphone transfer functions are in general non-minimum-phase, implying that some of the roots of the denominator polynomial Â(z) may be inside the unit circle and some may be outside the unit circle, the roots that are outside the unit circle may be associated with the anti-causal part of the inter-microphone impulse response h.sub.12 (n) and the roots that are inside the unit circle may be associated with the causal part of the inter-microphone impulse response h.sub.12 (n) to produce a stable system.
[0031] In order to eliminate sign and scale ambiguity in the estimated Â(z) and {circumflex over (B)}(z), in the embodiment of the present disclosure shown in
a.sub.0=1. (6)
The constraint also avoids the trivial solution Â(z)={circumflex over (B)}(z)=0 during minimization of the error E(z).
[0032] In the embodiment of the present disclosure shown in
[0033] In yet another embodiment of the present disclosure, a general linear equality constraint on the coefficients of the adaptive FIR filters Â(z) and {circumflex over (B)}(z) may be used. For illustration, suppose the adaptive FIR filter Â(z) has polynomial order M with coefficients denoted by {a.sub.0, . . . , a.sub.m} and the adaptive FIR filter {circumflex over (B)}(z) has polynomial order M 1 with coefficients denoted by {b.sub.0, . . . , b.sub.m−1}, then the linear constraint is of the form:
c.sub.0a.sub.0+c.sub.1a.sub.1+ . . . +c.sub.Ma.sub.M+c.sub.M+ib.sub.0+c.sub.M+2b.sub.1+ . . . +c.sub.2Mb.sub.M−1=d, (7)
where {c.sub.0, . . . , c.sub.2M} and d are constants. The constraint on the first coefficient of the adaptive FIR filter Â(z) to equal a fixed non-zero value is a special case of the linear constraint in which c.sub.0=1, c.sub.1=c.sub.2= . . . =c.sub.2M=0, with d being the fixed non-zero value, and the constraint on the first coefficient of the adaptive FIR filter Â(z) to equal unity is a special case of the linear constraint in which c.sub.0=1, c.sub.1=c.sub.2= . . . =c.sub.2M=0, and d=1.
[0034] As mentioned earlier, the delay D.sub.1 that needs to be introduced in the pole-zero or IIR modeling and estimation of the ABM inter-microphone transfer function is small. In the embodiments of the present disclosure in which the first coefficient of the adaptive FIR filter Â(z) is constrained to a unity value (shown in
[0035] In the embodiment of
The adaptive FIR filter Â′(z) characterizes the adaptable coefficients of adaptive FIR filter Â(z), as expressed in equation (8). A 1-sample delay element z.sup.−1 delays the delayed secondary microphone signal X.sub.2(z)z.sup.−D.sup.
[0036]
{circumflex over (X)}.sub.Z(z)=X.sub.1(z){circumflex over (B)}(z)−X.sub.2(z)z.sup.−(D.sup.
A second summing element subtracts signal {circumflex over (X)}.sub.2(z) from the delayed secondary microphone signal X.sub.2(z)z.sup.−D.sup.
The adaptive FIR filter Â′(z) is adapted jointly with adaptive FIR filter {circumflex over (B)}(z) to minimize the error signal E(z). Equation (11) shows that the IIR ABM embodiments of
[0037] Efficient implementation of the adaptive beamformer is crucial for deployment in real-time audio processing systems. The adaptation of the filters described in the systems above may be carried out using the well-known least mean squares (LMS) adaptive filtering algorithm, which is popular due to its low computational complexity and good convergence properties (see B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice Hall, 1985). The computational complexity may be further reduced using frequency-domain adaptive filtering techniques, as described in J. J. Shynk, “Frequency-Domain and Multirate Adaptive Filtering,” IEEE Signal Processing Magazine, vol. 9, no. 1, pp. 14-37, January 1992.
[0038] An algorithm for efficient frequency-domain implementation of the pole-zero or IIR ABM in accordance with embodiments of the present disclosure will now be described. The algorithm is described with respect to
Length-4M Fast Fourier Transforms (FFTs) are used to transform the combined coefficient and microphone signal vectors to frequency domain, as expressed in equations (14) and (15):
Ŵ.sup.(m)=fft(ŵ.sup.(m)), (14)
u.sup.(m)=fft(u.sup.(m)). (15)
The time-domain estimate of the delayed secondary microphone signal vector at each time m may be obtained efficiently as expressed by equations (16) and (17):
z=ifft(Ŵ.sup.(m).Math.U.sup.(m))=ifft(fft(ŵ.sup.(m)).Math.fft(u.sup.(m))), (16)
{circumflex over (x)}.sub.2.sup.(m)=[{circumflex over (x)}.sub.2(mM−M+1−D.sub.1), . . . , {circumflex over (x)}.sub.2(mM−D.sub.1)].sup.T=z(3M+1, . . . , 4M). (17)
The time-domain error signal vector at time m may be obtained as expressed by equation (18):
e.sup.(m)=x.sub.2.sup.(m)−{circumflex over (x)}.sub.2.sup.(m) (18)
where
x.sub.2.sup.(m)=[x.sub.2(mM−M+1−D.sub.1), . . . , x.sub.2(mM−D.sub.1)].sup.T (19)
A length-4M FFT of the pre-zero-padded error signal vector is used to transform the error to frequency domain, as expressed by equation (20):
The power spectral density (PSD) of the combined microphone signal vector may be computed using exponential averaging according to equation (21):
P.sub.uu.sup.(m)=γP.sub.uu.sup.(m−1)+(1−γ)|U.sup.(m)|.sup.2 (21)
where γ is a smoothing constant (0≤γ<1).
To minimize the error signal adaptively, at each time m, the frequency-domain combined coefficient vector may be updated efficiently using a block normalized LMS update step according to equation (22):
and μ is a step size parameter selected to ensure good convergence and tracking performance. The power normalization in equation (24) jointly pre-whitens and decorrelates the microphone signals in order to achieve further improvement in speed of convergence.
[0039] Although in the present disclosure an embodiment of the pole-zero or IIR ABM implementation based on computationally efficient frequency-domain adaptive filtering is described, other embodiments are contemplated in which the pole-zero or IIR ABM implementation is based on the computationally efficient multidelay or partitioned-block frequency-domain adaptive filtering (PBFDAF) approach with low block processing delay, described in J.-S. Soo and K. K. Pang, “Multidelay Block Frequency Domain Adaptive Filter,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 2, pp. 373-376, February 1990.
[0040] Although in the present disclosure embodiments are described in which the pole-zero or IIR ABM is implemented in the frequency domain, other embodiments are contemplated in which the IIR ABM is implemented in the time domain. Preferably, the systems 300, 400, and 500 include a digital signal processor (DSP) programmed to perform the operations of the FIR filters as well as other operations associated with a beamformer.
[0041] Although in the present disclosure embodiments are described in which a pole-zero representation or IIR filter is used to model and estimate the talker speech adaptive blocking matrix (ABM) of the GSC beamformer, other embodiments are contemplated in which a pole-zero representation or IIR filter is used to model and estimate the adaptive noise canceller or sidelobe canceller (SLC) of the GSC beamformer.
[0042] It should be understood especially by those having ordinary skill in the art with the benefit of this disclosure that the various operations described herein, particularly in connection with the figures, may be implemented by other circuitry or other hardware components. The order in which each operation of a given method is performed may be changed, unless otherwise indicated, and various elements of the systems illustrated herein may be added, reordered, combined, omitted, modified, etc. It is intended that this disclosure embrace all such modifications and changes and, accordingly, the above description should be regarded in an illustrative rather than a restrictive sense.
[0043] Similarly, although this disclosure refers to specific embodiments, certain modifications and changes can be made to those embodiments without departing from the scope and coverage of this disclosure. Moreover, any benefits, advantages, or solutions to problems that are described herein with regard to specific embodiments are not intended to be construed as a critical, required, or essential feature or element.
[0044] Further embodiments, likewise, with the benefit of this disclosure, will be apparent to those having ordinary skill in the art, and such embodiments should be deemed as being encompassed herein. All examples and conditional language recited herein are intended for pedagogical objects to aid the reader in understanding the disclosure and the concepts contributed by the inventor to furthering the art and are construed as being without limitation to such specifically recited examples and conditions.
[0045] This disclosure encompasses all changes, substitutions, variations, alterations, and modifications to the example embodiments herein that a person having ordinary skill in the art would comprehend. Similarly, where appropriate, the appended claims encompass all changes, substitutions, variations, alterations, and modifications to the example embodiments herein that a person having ordinary skill in the art would comprehend. Moreover, reference in the appended claims to an apparatus or system or a component of an apparatus or system being adapted to, arranged to, capable of, configured to, enabled to, operable to, or operative to perform a particular function encompasses that apparatus, system, or component, whether or not it or that particular function is activated, turned on, or unlocked, as long as that apparatus, system, or component is so adapted, arranged, capable, configured, enabled, operable, or operative.