Headset with end-firing microphone array and automatic calibration of end-firing array

Abstract

In one invention embodiment two microphones are attached to the ear cup and are configured as an end-firing array. The end-firing array suppresses unwanted sounds using an adaptive spectral method and spectral subtraction. According to a second embodiment, Automatic Calibration of an end-firing Microphone Array is provided.

Claims

1. A microphone system comprising: at least two microphones in a dual microphone end firing array configuration wherein the at least two microphones and delay elements combine to form a front cardioid signal and a rear cardioid signal wherein a null of the rear cardioid is positioned to point in the direction of a desired signal and a null of the front cardioid in an opposite direction; a filter bank configured to separate the front cardioid signal and the rear cardioid signal into a plurality of spectral bands; an amplifier configured to apply a gain to the rear cardioid signal as an adaptive rear reference signal for spectral subtraction from the front cardioid, wherein said amplifier is capable of applying different gain values in different ones of the plurality of spectral bands and changes the gain values when there is no speech detected in the front cardioid; and a subtraction module configured to suppress noise by adaptively subtracting in the plurality of spectral bands spectral signals corresponding to the rear reference signal derived from the rear cardioid signal from the front cardioid signal.

2. The microphone system as recited in claim 1 wherein a determination is made as to whether the amplifier gain values should be updated based on the energy measured in the rear cardioid signal and front cardioid signal.

3. The microphone system as recited in claim 1 wherein an updating of the amplifier gain values provides a time variable gain that suppresses on a subband by subband basis noise.

4. The microphone system as recited in claim 1 wherein the amplifier gain values are updated when no speech is detected and the system is further configured to remove background noise by spectral subtraction of the rear cardioid reference signal from the front cardioid signal.

5. The microphone system as recited in claim 1 wherein the system is further configured to maintain in a buffer a history of the reference signal for canceling reflected noise sounds.

6. The microphone system as recited in claim 1 wherein the end firing array is positioned on a headset ear cup.

7. The microphone system as recited in claim 2 wherein amplifier gain values are updated when the rear signal is determined to be dominant.

8. The microphone system as recited in claim 1 further comprising a module performing a signal detect function that activates when the energy of the front cardioid signal falls below a threshold with respect to energy of the rear cardioid signal.

9. A method for suppressing unwanted sounds using at least 2 microphones configured in a dual microphone end firing configuration comprising: forming a front cardioid signal and a rear cardioid signal from the end firing array; and using an adaptive spectral subtraction, wherein noise is suppressed in selected spectral bands by spectrally subtracting a rear reference signal generated from the rear cardioid signal having its null facing in the direction of the desired signal, wherein the adaptive spectral subtraction involves updating coefficient values derived from the rear reference signal only when no speech is detected from the front cardioid.

10. The method recited in claim 9 wherein the method is performed in an end firing microphone array arranged on a headset and further comprising: using a polyphase filter bank to separate the front cardioid signal and rear cardioid signal into spectral bands.

11. The method recited in claim 9 wherein a determination is made as to whether the coefficient values should be updated based on the energy measured in the rear cardioid signal relative to the front cardioid signal.

12. The method recited in claim 9 wherein the at least two microphones are configured such that the null of the rear cardioid is positioned to point in the direction of the desired signal and the front cardioid's null points in the opposite direction, with the rear facing cardioid signal being used as a reference signal to determine any similarities between the front cardioid signal and the rear cardioid signal.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a block diagram illustrating a directional microphone having dual microphones with plane sound pressure arriving at an angle implemented in various embodiments of the present invention.

(2) FIGS. 2A-2C illustrate directivity patterns of the cardioid obtained by varying the delay according to various embodiments of the present invention.

(3) FIG. 3 is a diagram illustrating an end firing array using two omnidiretional microphones to form front and back dual microphone end firing arrays n frequency response of a HPF (high-pass filter) according to various embodiments of the present invention.

(4) FIGS. 4A-D illustrate directivity patterns for the front and read cardioids illustrating the variation in null for different values of b in FIG. 3.

(5) FIG. 5 is a diagram illustrating the implementation of an adaptive signal processor to the dual microphone array according to various embodiments of the present invention.

(6) FIG. 6 is a plot showing the variation of magnitude with frequency for four different angular directions according to various embodiments of the present invention.

(7) FIG. 7 is a plot illustrating the compensation applied by the filter in FIG. 2, according to various embodiments of the present invention.

(8) FIG. 8 is a flowchart illustrating the end firing noise suppression algorithm according to various embodiments of the present invention.

(9) FIG. 9 is a plot illustrating the signal detect switch effect according to various embodiments of the present invention.

(10) FIG. 10 is a plot illustrating the fast, slow, and noise floor signals for the front cardioid according to various embodiments of the present invention.

(11) FIG. 11 is a plot illustrating the switch triggering the updating of the spectral noise floor estimate according to various embodiments of the present invention.

(12) FIG. 12 is a flowchart illustrating smoothed energies for the front and rear cardioids and SW according to various embodiments of the present invention.

(13) FIG. 13 is a plot illustrating the signal detect function with smoothed energies for front and rear cardioids according to various embodiments of the present invention.

(14) FIG. 14 is a plot illustrating spectral bands for 16 bands with a 256 reconstruction filter according to various embodiments of the present invention.

(15) FIG. 15 is a plot illustrating spectral bands for 16 bands with a 416 reconstruction filter according to various embodiments of the present invention.

(16) FIG. 16 is a plot illustrating spectral bands for 16 bands with a 512 reconstruction filter according to various embodiments of the present invention.

(17) FIG. 17 is a plot illustrating the suppression of the noise and the SW switch according to various embodiments of the present invention.

(18) FIG. 18 is a plot showing the energy for the first 10 bands and the SW switch according to various embodiments of the present invention.

(19) FIG. 19 is a plot showing the energy for the first 10 bands after the 1 ms estimate has been subtracted according to various embodiments of the present invention.

(20) FIG. 20 is a diagram illustrating the end firing automatic calibration device according to various embodiments of the present invention.

(21) FIG. 21 illustrates an apparatus for automatic and continuous calibration in accordance with one embodiment of the invention.

(22) FIG. 22 illustrates a method for providing automatic and continuous calibration in accordance with one embodiment of the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

(23) Reference will now be made in detail to preferred embodiments of the invention. Examples of the preferred embodiments are illustrated in the accompanying drawings. While the invention will be described in conjunction with these preferred embodiments, it will be understood that it is not intended to limit the invention to such preferred embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.

(24) Current ending firing implementations create directional nulls in the directivity pattern of the microphone array. In a reverberate environment the noise source may not come from a single direction. We use the energy of the rear and front cardioid to determine if the adaptive filter should be updated. A polyphase filter bank separates the front and rear cardioid signals into spectral bands. The rear signal is used as a reference to spectral subtract it from the desired signal in an adaptive manner. Also we keep a history of the reference signal so we can cancel reflected noise sounds up to the length of this history. In short, in the first embodiment we provide an improved system and method using the rear signal as a reference and spectral implementation.

(25) To reduce the background noise and improve the near field voice pickup we use an end firing dual microphone array. The microphones are configured to create two cardioid arrays. The null of the rear facing cardioid is positioned to point in the direction of the desired signal and the front cardioid's null points in the opposite direction.

(26) The rear facing cardioid signal is used as a reference signal to determine any similarities with the front cardioid. We then subtract any similarities knowing that the front facing cardioid is the only signal that contains direct speech. We use a frequency based adaptive method to estimate these similarities with the adaption updating only when there is no direct speech detected. For residual suppression we use spectral subtraction. Spectral subtract is also used when speech is detected to remove background noises. In a previous section we described how to create nulls in the directivity pattern using two cardioids; we also showed how to do this for different frequency bands by band passing the cardioid signals. When the user is in a enclosed environment the noise source is reflected and its reverberant energy can be high causing the noise to persist and come from multiple directions. In this section we describe a different method where we do not try to steer a null but instead use the rear cardioid as a reference signal which we subtract from the front cardioid. We do this in an adaptive method using a sub band spectral method. Each of the spectral bands have a history which is used to try and suppress reflected sounds and reverberate tails. When the front facing cardioid points towards someone talking (the user) their speech will be in the null of the rear facing cardioid array. Therefore the rear array will pick up ambient noises and reflected user speech. The front facing array picks up user speech, reflected user speech and ambient noise. The rear facing array signal can be used to reduce the ambient noise and reflected speech in the front facing signal to improve speech intelligibility. In this case we are not trying to create a null in the direction of the noise source but are instead using the rear facing end firing signal as a reference signal which we wish to subtract from the front facing signal.

(27) Adaptive End Firing Algorithm

(28) In FIG. 8 we show the control flow of the end firing adaptive algorithm. We start by creating the cardioid signals by introducing a one sample delay. These signals are smoothed in the Update Switch Box 802 using the method described in the following two sections.

(29) UpdateSwitch Signal Detect

(30) The signal detect routine uses the magnitude of the front cardioid signal to calculate X.sub.s and X.sub.f where
g=(X.sub.s<|x(n)|)?G0.sub.s:G1.sub.s;[4.1]
and
X.sub.s=g*X.sub.s+(1g)*|x(n)|[4.2]
and for X.sub.f
g=(X.sub.f<|x(n)|?G0.sub.f:G1.sub.f;[4.3]
and
X.sub.f=g*X.sub.f+(1g)*|x(n)|[4.4]
where G0.sub.fG1.sub.f=G0.sub.sG1.sub.s; So the signal X.sub.f adapts to variations in x more quickly than X.sub.s. So when X.sub.sX.sub.f there is a signal, see FIG. 9, otherwise there isn't. There is an additional condition that the signal magnitude must be above some noise threshold for it to be flagged as active. We determine the noise floor using the following method.
NoiseFloor=MIN(X.sub.f,NoiseFloor) (1+)[4.5]
where 0 is some small positive number used to keep the noise floor from freezing at a particular, see FIG. 10. We use these signals to determine a switch which we plot in FIG. 9 (showing the X.sub.s, X.sub.f, and signal detect in the plot of the signal detect switch). This switch is the signal detect or VAD. The VAD is given by the following equation
VAD=(X.sub.f>MAX (X.sub.s*(1+.sub.1), MAX(NoiseFloor*(1+.sub.2)MAGNITUDETHRESHOLD)))[4.6]
where .sub.1 and .sub.2 are small positive numbers and MAGNITUDETHRESHOLD is the minimum signal magnitude. We also use these signals to determine when the signal is back ground noise (DBGN),
DBGN=(X.sub.f<NoiseFloor*(1+.sub.3))[4.7]
where .sub.3 is some small positive number, see FIG. 11 (showing the switch determination of when to update the spectral noise estimate). These control variables are used to determine when the adaptive is to be updated.
UpdateSwitch Adaptive Filter Switch

(31) We begin by calculating the energy of the rear and front cardioid to determine whether the sound is in front or behind. The front signal's energy contains the users speech. Let Ef(m) and Er(m) be the energy of the front and rear at frame m so

(32) $\begin{matrix} Er (m) = {.Math.}_{n = 0}^{N - 1} {cr [m - n]}^{2} & [4.8] \\ Ef (m) = {.Math.}_{n = 0}^{N - 1} {cf [m - n]}^{2} & [4.9] \end{matrix}$
We then smooth these energies
SmR=SmR+(1)Er(m)[4.10]
SmF=SmF+(1)Ef(m)[4.11]
So when SmR and SmF are similar both contain ambient noise and then can be little or no user speech. For local speech we estimate the front energy must be greater that 105% of the rear energy. In FIG. 12 we plot the smoothed energies for the front and rear signals. FIG. 12 illustrates the smoothed energies for the front and rear cardioids and SW). The blue signal is the adaptive switch. If the front energy falls below 105% of the rear and no noise is detected then the filter coefficients are adjusted. In FIG. 13 we plot the smoothed signal energies and signal detection switch. That is, FIG. 13 illustrates the signal detect function with smoothed energies for front and rear cardioids.
SW=(SmF*G<SmR)?1:0;[4.12]
Analysis Filter Bank

(33) The whitened cardioid signals are fed into a Polyphase filter bank creating two spectral sets of data. We whiten the signal first using
w(n)=x(n)x(n1)[5.1]
to help decorrelate it. This helps the LMS algorithm to converge. After the synthesis reconstruction filter we do the inverse, that is
y(n)=y(n1)+w(n)[5.2]
to remove this whitening and get the correct time domain signal. The filter bank has been designed to have 16 bands in the Nyquist interval for a sample rate of 16 k Hz. In FIG. 14 we show the spectral bands for a filter of length 256 samples. If we increase the length of the prototype filter we can increase the band separation. In FIGS. 15 and 16 we show longer filters, i.e. 416 samples in FIG. 15 and 512 samples in FIG. 16. We can also increase the number of bands if we wish to improve the spectral resolution. In our current implementation we use 16 bands with a filter of 256 samples. We begin by designing a low pass filter and then spectrally shift this filter to obtain the band pass filters which we implement as a polyphase filter bank.

(34) Let h.sub.0(n) be the prototype filter so its z transform is

(35) $\begin{matrix} H_{0} (z) = {.Math.}_{n = 0}^{N - 1} h_{0} (n) z^{- n} & [5.3] \end{matrix}$
where N is the length of the filter. To create band pass filters at the frequencies 2m/M for 0m<M we spectral shift h.sub.(0)(k) to create h.sub.k(n)
h.sub.k(n)=h.sub.0(n)W.sub.M.sup.kn[5.4]
where k=0, 1, . . . M.sub.1, M is the number of bands, and

(36) $W_{M} = e^{- \frac{2 i}{M}} .$
Taking the z transform of this filter we get

(37) $\begin{matrix} H_{k} (z) = {.Math.}_{n = 0}^{N - 1} h_{k} z^{n} & [5.4] \end{matrix}$
If we now let n=q*M+m, where

(38) 0 $0 m < M$ $and$ $0 q < Q (Q = \frac{N}{M})$
we can express Eq 4.3 as

(39) $\begin{matrix} H_{k} = {.Math.}_{q = 0}^{Q - 1} ({.Math.}_{m = 0}^{M - 1} h_{0} (qM + m) z^{- Mq}) W_{M}^{k m} z^{- m} & [5.5] \end{matrix}$

(40) Which we can write as

(41) $\begin{matrix} (\begin{matrix} H_{0} \\ H_{1} \\ .Math. \\ .Math. \\ H_{M - 1} \end{matrix}) = [\begin{matrix} 1 & 1 & 1 & 1 & .Math. & 1 \\ 1 & W_{M} & W_{m}^{2} & W_{M}^{3} & .Math. & W_{M}^{(M - 1)} \\ 1 & W_{M}^{2} & W_{M}^{4} & .Math. & .Math. & W_{M}^{2 (M - 1)} \\ .Math. & .Math. & .Math. & .Math. & .Math. & .Math. \\ 1 & W_{M}^{(M - 1)} & W_{M}^{2 (M - 1)} & .Math. & .Math. & W_{M}^{(M - 1) (M - 1)} \end{matrix}] (\begin{matrix} E_{0} \\ E_{1} \\ .Math. \\ .Math. \\ E_{(M - 1)} \end{matrix}) Where E_{m} (Z) = {.Math.}_{q = 0}^{Q - 1} h_{0} (qM + m) Z^{- q} & [5.6] \end{matrix}$
Thus we can implement the filter bank using polyphase filtering and a FFT. The matrix in the above expression is in a Winograd form.
Adaptive Filter

(42) We only want to update the adaptive coefficients when we detect ambient noise or when the rear signal is dominant, otherwise we might adapt the filters to subtract users speech. We therefore freeze the adaption if we detect local speech and this is determined by the adaptive switch. If we let F(k)=Fr(k)+iF i(k) and R(k)=Rr(k)+iRi(k) be the spectral band values for the front and rear cardioids then the estimated error is

(43) $\begin{matrix} E (m) = {.Math.}_{n = 0}^{M} C (m) * R (m, n) & [5.7] \end{matrix}$
where C(k) are the complex coefficient and are updated using the normalized 1 ms method

(44) $\begin{matrix} C (k) = C (k) + (k) \frac{({Err (k)}^{*}) R (k)}{.Math. R^{2} .Math.} & [5.8] \end{matrix}$
where (k) can vary as a function of the band number and
Err(k)=F(k)E(k).[5.9]
In FIG. 17 we show the converge of the sum of the bands energy for the first 10 bands for the adaptive filter. FIG. 17 is a plot illustrating the suppression of the noise and the SW switch. From the plot we see that when the filter is allowed to adaptive it reduces noise signal in the front cardioid by 20 dB and by as much a 30 dB for the noise starting at frame 52000.
Residual Error Suppression

(45) We use the method of spectral subtraction to subtract the rear ambient noise estimated from the front array signal. We use two different noise floor estimates Ns[band] and Ne[band]. Ns is used when the 1 ms subtraction has been active and no user speech as been detected. The other estimate is used when the speech counter is greater than zero. This counter is decreased each time no speech is detected or set to the maximum every time it is. This counter determines a minimum speech interval but in that interval the signal may still contain speech pauses. We measure the noise floor and update for every bands during a speech pauses and the BackGroundNoise flag is true. We therefore have the following two cases:
if (BackGroundNoise and SW){Ns[band]=Ns[band]+(1)|Err[bands]|2;}otherwise
if (BackGroundNoise){(Ne[band]=Ne[band]+(1)|Err[bands]|2;}.
To subtract this estimate from the bands we uses spectral subtraction. If E(k) is the energy of spectral band k we define

(46) $\begin{matrix} g_{S} (K) = (1 -_{0} \sqrt{\frac{N_{S} (k)}{E (k)}}) and & [5.10] \\ g_{F} (k) = (1 -_{1} \sqrt{\frac{Ne (k)}{E (k)}}) & [5.11] \end{matrix}$
We now smooth these gains using
SmG.sub.S(k)=SmG.sub.S(k)+(1)g.sub.S(k)[5.12]
And
SmG.sub.F(k)=SmG.sub.F(k)+(1)g.sub.F(k)[5.13]
where 0<<1. We then adjust the spectral band k using
Error(k)=SmG.sub.S(k)Error(k)
Or
Error(k)=SmG.sub.F(k)Error(k)

(47) We also initialize these gains to typical values to reduce possible artifacts.

(48) According to a second embodiment, an apparatus and method for performing automatic and continuous calibration of an unmatched pair of microphones arranged in a known configuration and with an input source (human speaker, hereafter talker) in a known location is provided. The amplitudes of the signals from the 2 microphones are continuously monitored. The talker is in a known location relative to the microphone pair, so the expected amplitude difference between the signals at the 2 microphones can be pre-determined, and compensated for. The talker is differentiated from input signals in other locations by applying simple heuristic metrics to the input pair. A compensating gain coefficient is derived from the relative amplitudes of the 2 microphone signals, and averaged over the long term. The averaged compensating gain is applied to one of the microphone signals to provide balanced input from the talker.

(49) FIG. 20 illustrates an apparatus for automatic and continuous calibration in accordance with one embodiment of the invention. A processor 115 (or processors) may be placed in an earcup 106b of the headphone to perform the various calibration functions as well as filtering functions, delay functions, comparative functions, and steering functions described herein.

(50) FIG. 21 illustrates an apparatus for automatic and continuous calibration in accordance with one embodiment of the invention. The headset 104 is shown placed on the head of user 102. The input signal location of the talker here is shown as location 110. The headset 104 includes ear cups 106a and 106b. The dual microphones 108, 109 can be located anywhere within the configuration provided by the in-use headset 104 but preferably on the same ear cup, such as shown here on ear cup 106b. The electronics to perform the processing of the signals to calibrate the headphones can be located either within the headset 104 or externally. In a preferred embodiment the electronics are located within the headset such as within the ear cup containing the dual microphones, such as within ear cup 106b. As can be appreciated by those of skill in the relevant arts, the ear cups are typically connected by a mechanical connection such as shown in FIG. 21, which connection also sometimes houses electronic cables to communicate signals from one ear cup to the other. The headset 104 as configured is used to provide automatic and continuous calibration to the two microphones 108, 109.

(51) FIG. 22 illustrates a method for providing automatic and continuous calibration in accordance with one embodiment of the present invention. The method starts with the recognition of the known distance parameters in step 202. That is, the method relies on the assessment of the location of the talker with respect to the two microphones. Given known location of the talker relative to the microphone pair, an expected amplitude difference between the signals at the 2 microphones can be pre-determined, and compensated for. Next the relative amplitudes of the input microphone signals from the 2 microphones are monitored in step 204. The talker is differentiated from input signals in other locations by applying simple heuristic metrics to the input pair. Next in step 206 a compensating gain coefficient is derived from the relative amplitudes of the 2 microphone signals, and averaged over the long term. A long term average compensating gain is applied in step 208 to one of the microphone signals to provide balanced input from the talker. The method ends at step 210.

(52) Even if the mechanism for distinguishing the talker from other input sources is fooled by some non-well-formed input signal, the long term averaging of the compensating gain coefficient will keep the system from following the errant input too quickly, and will keep the system tending towards nominal and correct operation, as the normal input conditions are likely to occur more frequently than the abnormal conditions.

(53) Several advantages are provided by the novel system:

(54) The continuous, long term compensation for mismatched microphones provides: the use of less expensive (unmatched for gain) microphone pairs no need to perform a calibration diagnostic at the point of production (factory) no need to perform a calibration by the end user (customer) no need for persistent storage of the gain compensation value

(55) Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.

Headset with end-firing microphone array and automatic calibration of end-firing array

Assignee

Inventors

Cpc classification

Classification Explorer

H04R1/406

ELECTRICITY

Classification Explorer

G10L21/0216

PHYSICS

Classification Explorer

H04R29/005

ELECTRICITY

Classification Explorer

H03G5/165

ELECTRICITY

Classification Explorer

G10L2021/02168

PHYSICS

Classification Explorer

H04R3/005

ELECTRICITY

Classification Explorer

H04R2201/107

ELECTRICITY

Classification Explorer

H04R3/04

ELECTRICITY

Classification Explorer

H04R1/1083

ELECTRICITY

Classification Explorer

H04R29/006

ELECTRICITY

Classification Explorer

H04R2430/03

ELECTRICITY

International classification

Classification Explorer

H04R1/40

ELECTRICITY

Classification Explorer

H04R29/00

ELECTRICITY

Classification Explorer

G10L21/0216

PHYSICS

Classification Explorer

H04R3/00

ELECTRICITY

Classification Explorer

H04R1/10

ELECTRICITY

Classification Explorer

H03G5/16

ELECTRICITY

Abstract

Claims

Description