Recursive noise power estimation with noise model adaptation

Abstract

A method of signal processing to generate hearing implant stimulation signals for a hearing implant system includes transforming an input sound signal into band pass signals each representing an associated frequency band of audio frequencies. The band pass signals are processed in a sequence of sampling time frames and iterative steps to produce a noise power estimate. This includes using a noise prediction model to determine if a currently observed signal sample includes a target signal, and if so, then updating a current noise power estimate without using the currently observed signal sample, and otherwise updating the current noise power estimate using the currently observed signal sample. The noise prediction model also is adapted based on the updated noise power estimate. The hearing implant stimulation signals are then developed from the band pass signals and the noise power estimate.

Claims

1. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein using the noise prediction model is based on a hard decision comparison of the currently observed signal sample to a variable threshold, the variable threshold representing a likelihood ratio test-statistic $(P_{y}) = \frac{L (P_{y} .Math. H_{1})}{L (P_{y} .Math. H_{0})} = \frac{f_{P_{y}} (P_{y} .Math. P_{s} 0)}{f_{P_{y}} (P_{y} .Math. P_{s} = 0)} .$ ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {tilde over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].

2. The method according to claim 1, wherein updating the current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k] includes using the current signal power P.sub.y[n,k] and the estimated noise power from an immediately preceding time frame n1 and a last iteration step I, {circumflex over (P)}.sub.d[n1,k,I] so that the current noise power estimate {circumflex over (P)}.sub.d[n,k,i]={circumflex over (P)}.sub.d[n1,k,I]+(1)P.sub.y[n,k], where is a smoothing parameter.

3. The method according to claim 1, wherein updating the current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k] includes maintaining constant the current noise power estimate {circumflex over (P)}.sub.d[n,k,i].

4. The method according to claim 1, wherein updating the current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k] includes additionally using a weighted sum of neighboring noise power estimates, {circumflex over (P)}.sub.d[n,k,i]=(1){circumflex over (P)}.sub.d[n1,k,I]+P.sub.d[n1,k,I] with P.sub.d[n1,k,I]=.sub.lk.sup.Kw.sub.l,k{circumflex over (P)}.sub.d[n1,l,I] with suitably chosen weights w.sub.l,k and parameters a, b, m, .

5. The method according to claim 1, wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is performed after all I iterative steps for a given time frame n have been performed.

6. The method according to claim 1, wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is performed after each iteration i for a given time frame n.

7. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {tilde over (P)}.sub.d[n,k,I] for noise reduction of the band pass signals y.sub.k[n].

8. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {tilde over (P)}.sub.d[n,k,I] for channel selection of the band pass signals y.sub.k[n].

9. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {tilde over (P)}.sub.d[n,k,I] for a power saving functionality of the hearing implant system.

10. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {circumflex over (P)}.sub.d[n,k,I] for channel selection of the band pass signals y.sub.k[n].

11. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein using the noise prediction model is based on a probability-based decision comparison of the currently observed signal sample P.sub.y[n,k] to a variable threshold [n,k,i], using a speech absence probability p[n,k,i] in an interval [0,1], where p[n,k,i]=g([n,k,i],P.sub.y[n,k]), so that the noise power estimate {circumflex over (P)}.sub.d[n, k, i] at iteration i, time frame n, and sub-band k is {tilde over (P)}.sub.d[n,k,i]=p[n,k,i]{circumflex over (P)}d,sa[n,k,i]+(1p[n,k,i]){circumflex over (P)}.sub.d,sp[n,k,i], ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {tilde over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {tilde over (P)}[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].

12. The method according to claim 11, wherein the speech absence probability p[n,k,i] is a sigmoidal function where $p [n, k, i] = \frac{1}{1 + e^{-_{k} t [n, k, i]}},$ with a steepness determined by t[n,k,i]=[n,k,i]P.sub.y[n,k] and .sub.k.

13. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein using the noise prediction model {tilde over (P)}.sub.d[n,k,i] is a time variant noise model that is a first order autoregressive model {tilde over (P)}[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,I]{circumflex over (P)}.sub.d[n,k,i1]) with model parameters =[.sub.1, .sub.2, . . . , .sub.M].sup.T, ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].

14. The method according to claim 13, wherein the noise prediction model {tilde over (P)}.sub.d[n,k,i] is based on estimates from neighboring sub-bands {tilde over (P)}.sub.d[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,I],{circumflex over (P)}.sub.d[n,k,i1],{circumflex over (P)}.sub.d[n1,lk,I],{circumflex over (P)}.sub.d[n,lk,i1]).

15. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein the noise prediction model {tilde over (P)}.sub.d[n,k,i] is a time variant noise model that is a linear autoregressive model of a linear combination of estimated noise power of a previous iteration and two directly neighboring sub-bands, {tilde over (P)}.sub.d[n,k,i]=.sub.i=k1.sup.k+1.sub.l[n,k]{circumflex over (P)}.sub.d[n,l,i1], where for i=1, {circumflex over (P)}.sub.d[n,k,0]={circumflex over (P)}.sub.d[n1,k,I], representing estimated noise power from previous time frame n1, ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {tilde over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].

16. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein the noise prediction model {tilde over (P)}.sub.d[n,k,i] is a time variant noise model that is a linear autoregressive model of a linear combination of M already estimated noise powers and estimated noise power of a preceding iteration i1 and two L neighboring noise power estimates, {tilde over (P)}.sub.d[n,k,i]=l=k1.sup.k+1.sub.0l[n,k]{tilde over (P)}.sub.d[n,I,i1]+.sub.m=1.sup.M.sub.i=kL.sup.k+Lml[n,k]{circumflex over (P)}.sub.d[nm,l,I], where for i=1, {circumflex over (P)}.sub.d[n,k,0]=0, ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}[n,k,i] based on the updated noise power estimate {tilde over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].

17. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,l] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i] wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is based on a continuous adaptation of one or more model optimization criteria the one or more model optimization criteria including minimizing a mean squared error J=E{e[n,k,I].sup.2} of a prediction error e[n,k,I]={circumflex over (P)}.sub.d[n,k,I]{tilde over (P)}.sub.d[n,k,I]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate P.sub.d[n,k,I].

18. The method according to claim 17, wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is based on adapting parameters of the noise prediction model {tilde over (P)}.sub.d[n,k,i] using a steepest descent method .sub.n,k=.sub.n1,k.sub.J with a fixed step size .

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows anatomical structures of a typical human ear with a cochlear implant system.

(2) FIG. 2 shows various functional blocks in a signal processing arrangement for a typical cochlear implant system.

(3) FIG. 3 shows various functional blocks in a signal processing arrangement for a cochlear implant system according to an embodiment of the present invention.

(4) FIG. 4 shows various functional blocks in a signal processing arrangement for a cochlear implant system according to another embodiment of the present invention.

(5) FIG. 5 shows functional blocks in an iterative noise power estimation process with prediction and estimation.

(6) FIG. 6 shows an example of iterative noise power estimation with two prediction steps and two estimation steps.

(7) FIGS. 7A and 7B show a speech waveform in white noise with signal power, estimated noise power and threshold traces.

(8) FIG. 8 shows functional blocks in an iterative noise power estimation process with prediction, estimation and adaptation.

(9) FIG. 9 shows a flow chart algorithm of prediction, estimation and adaptation where the adaptation is outside the iteration loop.

(10) FIG. 10 shows a flow chart algorithm of prediction, estimation and adaptation where the adaptation is inside the iteration loop.

DETAILED DESCRIPTION

(11) Embodiments of the present invention are directed to an improved approach to blind estimation of the noise power in an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] (e.g., speech) and a non-information bearing disturbing (noise) signal d[n]:y[n]=s[n]+d[n], where n is the time-index, referred to as the time frame. In particular, the problem of detecting time frames when the target signal s[n] is absent is addressed. In those time-frames, an estimate for the noise power can be updated by using the (observable) input sound signal y[n], since then y[n]=d[n]. The noise power estimate is recursively reused to update the prediction for the next estimation step. This approach differs from existing methods such as described in U.S. Pat. No. 8,385,572 in that no signal model is directly used in a noise power estimation algorithm.

(12) Estimating the noise power can be useful for a number of signal processing applications in a hearing implant system. These applications include: Noise reduction purposesSub-band signals with a poor signal-to-noise ratio (SNR) in a given time frame can be attenuated to improve the SNR, and thus users potentially enjoy better speech perception in noise. Cochlear implant (CI) signal codingSelecting only electrode channels with a high SNR or low noise power for stimulation can offer an improved hearing experience. Power saving strategiesDuring noise-only situations, the stimulation pattern can be changed to save power, e.g., by reducing the stimulation rate and/or amplitude.

(13) FIG. 3 shows various functional blocks in a signal processing arrangement for a cochlear implant system according to an embodiment of the present invention which is based on a conventional electrical stimulation-based cochlear implant, where a Preprocessor Filter Bank 201 processes the input sound signal y[n] to perform analog-to-digital conversion and apply an analysis filter bank to generate band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies which also are associated with a set of corresponding auditory neurons. In addition, the Envelope Detector 302, Fine Structure Detector 303, Pulse Generator 304, and Implant 305, operate basically as discussed above with respect to FIG. 2. The arrangement shown in FIG. 3 also has additional processing stages arranged in a noise reduction system for Noise Power Estimation 306, SNR Estimation 307, and Gain Calculation 308 to determine gain factors based on the noise power estimate {circumflex over (P)}.sub.d[n, k, l] that is applied on the frequency sub-bands by a Gain Application 309. The hearing implant stimulation signals Z are developed from the band pass envelope signals Y.sub.k[n] and the fine structure signals X.sub.k[n] for delivery to an implanted portion of the hearing implant system. FIG. 4 shows various functional blocks in a signal processing arrangement for a cochlear implant system according to another embodiment of the present invention where noise power and/or SNR estimation is similarly performed and used for sound coding purposes. The Envelope Detector 402, Fine Structure Detector 403, Pulse Generator 404, Noise Power Estimation 406, SNR Estimation 407, Gain Calculation 408 and Implant 405 operate basically as discussed above with respect to FIG. 2 or 3. Different to FIG. 3, where the Gain Application 309 stage precedes Envelope Detector 302 and Fine Structure Detector 303, the output of Gain Calculation 408 is feed for application to the stages Envelope Detector 402 and Fine Structure Detector 403 directly. In this example gain application is integrated into the respective stage and might apply gain factors {circumflex over (P)}.sub.d[n, k, l] independently from each other, i.e. Fine Structure Detector 403 may apply the gain factors differently than Envelope Detector 402. In one embodiment Fine Structure Detector 403 and Envelope Detector 402 may apply the gain factors dependent from each other, for example given a certain functional relationship. The functional relationship may for example depend on a cross-correlation property.

(14) In such systems, the Noise Power Estimation Module 306 splits the estimation of the unknown noise power into three main steps: 1. PredictionFirst, the noise power is predicted for the current point in time using a model of the underlying noise process. Based on the prediction, a decision is made as to the presence or absence of speech. 2. EstimationUsing the speech presence decision, the current noise power estimate is updated. 3. AdaptationAnd the updated noise power estimate is used to update the noise prediction model that predicts the noise power for the next step.
It is assumed that the estimate will be closer to the true value of the noise power than the predicted value is. The increase in information about the unknown noise power after the estimation step is used to improve the noise model. Thus the prediction for the next step is improved, enabling a more accurate decision regarding speech presence or absence.

(15) Prediction and estimation can be performed several times for the same time point n, so that the Noise Power Estimation Module 306 processes the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n, k, l]. For each time frame n and iteration i, the Noise Power Estimation Module 306 uses a noise prediction model {tilde over (P)}.sub.d[n, k, i] to determine if a currently observed signal sample P.sub.y[n, k] includes the target signal s[n]. If the currently observed signal sample P.sub.y[n, k] includes the target signal s[n], then a current noise power estimate {circumflex over (P)}.sub.d[n, k, i] is updated without using the currently observed signal sample P.sub.y[n, k]. Otherwise, if the currently observed signal sample P.sub.y[n, k] does not include the target signal s[n], then the current noise power estimate {circumflex over (P)}.sub.d[n, k, i] is updated using the currently observed signal sample P.sub.y[n, k]. The noise prediction model {tilde over (P)}.sub.d[n, k, i] also is adapted based on the updated noise power estimate {circumflex over (P)}.sub.d[n, k, i]. Performing multiple iterative steps increases the probability of a correct decision regarding speech presence or absence, and thus leads to a more accurate noise power estimate {circumflex over (P)}.sub.d[n, k, I].

(16) The observed target signal s[n] and noise signal d[n] are assumed to be realizations of locally stationary stochastic processes in which the statistics of the processes (e.g., represented by statistical moments such as mean and variance) are allowed to change slowly over time. For example, the signal powers are time-variant, but remain more or less constant within a short time window. The time window within which the noise process can be regarded as being stationary (i.e., the moments don't change) is assumed to be longer than that of the target (speech) process. In addition, it is assumed that the noise and speech processes are statistically independent with zero mean. Using the second assumption the signal power is P.sub.y=E{(s+d).sup.2}=E(s.sup.2)+E(d.sup.2)=P.sub.s+P.sub.d. That is, simply the addition of the speech power and noise power, where E{} denotes statistical expectation.

(17) Typically, the input sound signal y[n] is decomposed into a number of sub-bands using, e.g., a filter bank (time domain, DFT, other subspaces, . . . ): y.sub.k[n]=FB(y[n]), k=1, . . . , K. The processing is typically performed per time and sub-band. If not needed, time and sub-band indices are suppressed in the following. Since the expectation operation cannot be performed in a real implementation, it is typically approximated using an average over time, e.g., by using a low pass filter. The estimated signal power is then P.sub.y= custom character (s+d).sup.2=s.sup.2+d.sup.2=P.sub.s+P.sub.d, where denotes averaging over time. Either the squared signal as stated above or, equivalently, the squared envelope is used. For speech processing applications the low pass filter has typically a 6 dB cut-off frequency of approximately 5-50 Hz, which comprises the speech modulations. After low pass filtering, a sampling rate decimation to a significantly lower sampling rate (e.g., 80-100 Hz) can be applied in order to reduce the computational complexity of the following stages.

(18) FIG. 5 shows functional blocks in an iterative noise power estimation process with prediction and estimation, where for time frame n, and iterative step i, with iteration memory elements q.sub.i.sup.1 502 and 504 for iteration index i: q.sub.i.sup.1x[n, i]=x[n, i1]. To decide whether the currently observed signal sample P.sub.y[n, k] contains both the target signal s[n] and noise signal d[n], or only the noise signal d[n], Estimation Module 501 performs an iterative hypothesis test. If the Estimation Module 501 decides that the currently observed signal sample P.sub.y[n, k] contains only a noise signal d[n], then the Estimation Module 501 updates the current noise power estimate {circumflex over (P)}.sub.d[n, k, i] using the signal sample P.sub.y[n, k]. If the Estimation Module 501 decides that the currently observed signal sample P.sub.y[n, k] contains both the target signal s[n] and noise signal d[n], then the Estimation Module 501 updates the current noise power estimate {circumflex over (P)}.sub.d[n, k, i] without using the current signal sample P.sub.y[n, k], either keeping the current noise power estimate {circumflex over (P)}.sub.d[n, k, i] constant, or updating it using a number of neighbouring noise power estimates for other sub-bands other than bank k.

(19) More specifically, the hypothesis test at iteration i is a simple comparison of the current sample P.sub.y against a variable threshold :

(20) P.sub.y[n, k][n, k, i]: P.sub.y[n, k] consists of noise only (null-hypothesis H.sub.0)

(21) P.sub.y[n, k]>[n, k, i]: consists of noise and speech (hypothesis H.sub.1).

(22) The noise power estimate {circumflex over (P)}.sub.d[n, k, i] is then constructed based on the hypothesis-test decision. Recursive smoothing over time n and/or sub-band k may also be applied by which the correlation of the noise power over time and/or sub-bands is taken into account. If the hypothesis test indicates that the speech signal s[n] is absent (null-hypothesis H.sub.0), then the noise power estimate {circumflex over (P)}.sub.d[n, k, i] is updated using the current signal sample P.sub.y[n, k] and the estimated noise power from time point n1 and the last iteration step I, {circumflex over (P)}.sub.d[n1, k, I]:
{circumflex over (P)}.sub.d,sa[n,k,i]={circumflex over (P)}.sub.d[n1,k,I]+(1)P.sub.y[n,k]
Using a hard threshold decision, the noise power estimate is then:
P.sub.y[n,k][n,k,i]:{circumflex over (P)}.sub.d[n,k,i]={circumflex over (P)}.sub.d,sa[n,k,i].
If the null-hypothesis is rejected (speech is present), the noise power estimate {circumflex over (P)}.sub.d[n, k, i] is kept constant, i.e.,
{circumflex over (P)}.sub.d,sp[n,k,i]={circumflex over (P)}.sub.d[n1,k,I].
The update of the noise power estimate is then
P.sub.y[n,k]>[n,k,i]:{circumflex over (P)}.sub.d[n,k,i]={circumflex over (P)}.sub.d,sp[n,k,i].

(23) Alternatively, in the case of speech present, the noise power estimate {circumflex over (P)}.sub.d[n, k, i] can be updated using additionally a weighted sum of neighbouring noise power estimates,

(24) ${\hat{P}}_{d, sp} [n, k, i] = (1 -) {\hat{P}}_{d} [n - 1, k, I] + {\overset{.Math.}{P}}_{d} [n - 1, k, I]$ $with$ ${\overset{.Math.}{P}}_{d} [n - 1, k, I] = {.Math.}_{l k}^{K} w_{l, k} {\hat{P}}_{d} [n - 1, l, I]$
with suitably chosen weights w.sub.l,k, e.g.,
w.sub.l,k= exp(b|lk|.sup.m)
and suitably chosen parameters a, b, m. With this weighting, distant sub-bands contribute less than neighbouring sub-bands, reflecting, e.g., a decrease of the correlation if the distance in frequency increases. The weights w.sub.l,k and/or the parameters a, b, m can also be estimated and updated continuously using already existing noise power estimates from time frames before n or from time frame n and previous iterations i. The smoothing parameters (in case speech absent) and (speech present) determine the degree of influence of the noise power estimate from time frame n1 and model in a simple manner the correlation of the noise power over time.

(25) Instead of a hard threshold decision as described above, a soft threshold decision could be used and might be advantageous since errors regarding the decision of speech absence or presence would have less weight. The output of the comparison with the threshold is defined as speech absence probability. A decision
p[n,k,i]=g([n,k,i],P.sub.y[n,k]),
with a suitable function g () providing (soft) values for the speech-absent probability in the interval [0,1] can be used. E.g., a sigmoidal function

(26) $p [n, k, i] = \frac{1}{1 + e^{-_{k} t [n, k, i]}}, with$ $t [n, k, i] = [n, k, i] - P_{y} [n, k]$
and .sub.k determining the steepness of the function. A hard decision is achieved for the limit case .sub.k.fwdarw.. Using the speech absence probability p[n, k, i], the noise power estimate at iteration i, time frame n, and sub-band k is then
{circumflex over (P)}.sub.d[n,k,i]=p[n,k,i]{circumflex over (P)}.sub.d,sa[n,k,i]+(1p[n,k,i]){circumflex over (P)}.sub.d,sp[n,k,i],
with the speech-presence probability 1p[n, k, i]. For the first simple case described above, the noise power estimate is then

(27) $\begin{matrix} {\hat{P}}_{d} [n, k, i] = p [n, k, i] {\hat{P}}_{d, sa} [n, k, i] + (1 - p [n, k, i]) {\hat{P}}_{d, sp} [n, k, i] = \\ = p [n, k, i] ({\hat{P}}_{d} [n - 1, k, I] + (1 -) P_{y} [n, k]) + \\ (1 - p [n, k, i]) {\hat{P}}_{d} [n - 1, k, I] \\ = (1 - \tilde{p} [n, k, i]) {\hat{P}}_{d} [n - 1, k, I] + \tilde{p} [n, k, i] P_{y} [n, k] \end{matrix}$
with a scaled speech-absence probability {tilde over (p)}[n, k, i]=p[n, k, i](1).

(28) The threshold can be derived using a stochastic signal model that treats the involved signals P.sub.y, P.sub.s, P.sub.d as stochastic processes, using a likelihood ratio test-statistic (Neyman, J., Pearson, E., On the problem of the most efficient test of statistical hypotheses, Philosophical Transactions of the Royal Society of London, Series A, Containing Papers of a Mathematical or Physical Character 231, pp. 289-337, 1933; incorporated herein by reference in its entirety):

(29) $(P_{y}) = \frac{L (P_{y} .Math. H_{1})}{L (P_{y} .Math. H_{0})} = \frac{f_{P_{y}} (P_{y} .Math. P_{s} 0)}{f_{P_{y}} (P_{y} .Math. P_{s} = 0)}$
where f.sub.P.sub.y(P.sub.y|P.sub.s) is the conditional probability density function (amplitude distribution) of the process P.sub.y given P.sub.s. The likelihood-ratio is compared to a threshold (P.sub.y)>, and decided in favour of hypothesis H.sub.1 (speech present) if the inequality holds. The aim is to maximise the probability of a correct decision (to detect speech present if speech is in fact present) for a given probability of false-alarm p.sub.FA (deciding for speech present when speech is in fact absent). The false-alarm probability is the probability that the test-statistic (P.sub.y) is larger than the threshold if in fact speech is absent, i.e., hypothesis H.sub.0 is in force
p.sub.FA=p[(P.sub.y)>|H.sub.0]=.sub.{P.sub.y.sub.:(P.sub.y.sub.)>}.sup.f.sub.P.sub.y(P.sub.y|P.sub.s)dP.sub.y.
With this equation, the threshold for a given false-alarm probability can be determined.

(30) The threshold is a function of the unknown noise power P.sub.d since P.sub.y=P.sub.s+P.sub.d. In order to be able to calculate a threshold, a prediction {tilde over (P)}.sub.d[n] of the unknown noise power for time n as discussed below is used. This yields for the threshold [n]=(p.sub.FA, {tilde over (P)}.sub.d[n]) where the function () depends on the assumed probability density f.sub.P.sub.y(P.sub.y|P.sub.s).

(31) The key for an accurate estimation of the noise power is a correct decision whether the currently observed sample P.sub.y[n] results from speech and noise or noise only. This decision is based on the threshold calculation and depends on the targeted false-alarm probability and the noise-power. Since the noise-power is unknown and the aim of the process, the threshold cannot be calculated directly. Instead, a predicted value for the unknown noise power based on a time-variant noise-model can be used based on previous noise power estimates {circumflex over (P)}.sub.d[n1, k, I]{circumflex over (P)}.sub.d[n2, k, I], . . . as well as estimates produced at previous iteration steps, i.e., {circumflex over (P)}.sub.d[n, k, i1], {circumflex over (P)}.sub.d[n, k, i2], . . . . A prediction for the noise power for the current iteration step then can be made by using, e.g., an auto regressive model of first order (AR-1):
{tilde over (P)}.sub.d[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,l],{circumflex over (P)}.sub.d[n,k,i1]),
where =[.sub.1, .sub.2, . . . , .sub.M].sup.T are the model parameters. In some specific embodiments, estimates from neighbouring sub-bands can be used in the prediction model, too:
{tilde over (P)}.sub.d[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,I],{circumflex over (P)}.sub.d[n,k,i1],{circumflex over (P)}.sub.d[n1,l k,I],{circumflex over (P)}.sub.d[n,lk,i1]).

(32) The prediction model parameters for the noise power are adapted to increase the accuracy of succeeding predictions. This is done by using the final estimate for the noise power at time n and iteration-end I, {circumflex over (P)}.sub.d[n, k, I] and the prediction {tilde over (P)}[n, k, I]. Specifically, the difference between the two gives information about the mismatch between the model and the actual noise process, and is used for adapting the model parameters. Since the model is adapted, the parameters are changing over time, i.e., the (linear or nonlinear) model itself changes over time. The adaptation rule as described further below defines how the parameters are adapted to the current situation.

(33) For predicting the noise-power, various different specific models can be used; for example, a linear AR-11 model in which the predicted noise power is a linear combination of the estimated noise power of the previous iteration and two directly neighbouring sub-bands:

(34) ${\tilde{P}}_{d} [n, k, i] = {.Math.}_{l = \max (k - 1, 1)}^{\min (k + 1, K)}_{l} [n, k] {\hat{P}}_{d} [n, l, i - 1]$
whereby for i=1, {circumflex over (P)}.sub.d[n, k, 0]={circumflex over (P)}.sub.d[n1, k, I], i.e., the estimate from the previous time frame n1. Or a linear AR-ML model could be employed where the predicted noise power is a linear combination of M already estimated noise powers and the estimated noise power of the last iteration, as well as 2 L neighbouring noise power estimates:

(35) ${\tilde{P}}_{d} [n, k, i] = {.Math.}_{l = \max (k - 1, k)}^{\min (k + 1, K)}_{0 l} [n, k] {\hat{P}}_{d} [n, l, i - 1] + {.Math.}_{m = 1}^{M} {.Math.}_{l = \min (k + L, K)}^{\max (k - L, 1)}_{m l} [n, k] {\hat{P}}_{d} [n - m, l, I] .$
Or a nonlinear model could be used where the predicted noise power is a nonlinear function with respect to the estimated noise powers, in which case, many different alternatives can be implemented, such as a recursive polynomial model.

(36) For a linear-in-the-parameters prediction model, the model parameters can be condensed into a vector and the prediction is written as {tilde over (P)}.sub.d[n, k, i]=.sub.n,k,i.sup.T.sub.n,k. For a linear AR-11 model:
.sub.n,k,i.sup.T=[{circumflex over (P)}.sub.d[n,k1,i1],{circumflex over (P)}.sub.d[n,k,i1],{circumflex over (P)}.sub.d[n,k+1,i1]]
and:
.sub.n,k.sup.T=[.sub.1[n,k],.sub.0[n,k],.sub.+1[n,k]].

(37) FIG. 6 shows an example for I=2 iteration steps where the estimate of time frame n1 is available. The first step in the first iteration is to predict the noise power for time n. In the example, the prediction is based on the estimated noise power at n1, k, k1, k+1. Based on the predicted noise power, the speech-absence probability (sap, p[n, k, i=1]) is calculated. Using the speech-absence probability, the noise power for time frame n and iteration i=1 is calculated. These calculations are performed for all sub-bands, before the next iteration is initiated. By performing more than one iteration per time frame opens the possibility to correct suboptimal estimations, e.g., due to a wrong decision regarding speech presence.

(38) Two cases, reflecting two situations prone for a false decision for speech presence or absence can be briefly considered. In a case where there is a rising noise power and speech is absent, then it is likely that it might be decided for speech presence due to the increasing signal power. If at time frame n, sub-band k, iteration i=1 it was erroneously decided for speech presence, the noise power estimate is not updated and will not follow the increasing noise power, i.e., it will be too small. If in the neighbouring sub-bands k1, k+1 the decision is correct, the estimates for the noise power are updated correctly and increase. In the next iteration step, the prediction for the noise power in sub-band k is based on the updated noise power estimates in the neighbouring sub-bands and will increase also, assuming the noise model is sufficiently accurate. The probability for a correct speech presence or absence decision at this iteration step is increased now since the noise power prediction will be more accurate, and thus the decision for speech absence more likely, resulting in a larger probability for an update of the noise power estimate.

(39) In a different case where there is a falling noise power and speech is present, it is likely that it might be decided for speech absent due to the decreasing signal level. That is, it might happen that at time frame n, sub-band k, iteration i=1, it is decided for speech absent. The noise power then will be updated erroneously. Assuming correct decisions and updates in the neighbouring sub-bands, i.e., decreasing noise power estimates there, at iteration i=2 it might be decided for speech presence, leading to a correct update of the noise power.

(40) With this method, the speech absence probability is iteratively calculated, and, due to the correlation across sub-bands, it is assumed that a false decision at one iteration step is corrected in one of the following steps. FIGS. 7A and 7B show a simple estimation example with speech in white noise. Two sub-bands are shown along with the estimated noise power and the threshold. The threshold in this example is derived from a time-invariant prediction model considering only one estimated noise power sample from the same sub-band.

(41) FIG. 8 shows functional blocks in an iterative noise power estimation process with a Noise Power Estimation Module 801, Noise Prediction Model 804, Noise Model Adaptation Module 803, and iteration memory elements q.sub.i.sup.1 802 and 805 for iteration index i. Within the Adaptation Module 803 the prediction model parameters are adapted using the information gained in the estimation step. Specifically, the difference is used between the prediction and the estimation in the last iteration step {tilde over (P)}.sub.d (n, k, I) and {circumflex over (P)}.sub.d(n, k, I). It is assumed that in the estimation step the knowledge about the unknown noise power increases, as compared with the prediction. The prediction is just used for deciding for speech presence or not. Even if the prediction is not very accurate, the decision regarding speech presence might be correct. If the decision regarding speech presence is correct, in the estimation step the knowledge about the unknown noise power is increased and this gain in information is exploited for adapting the prediction model. The model-parameters are continuously adapted according to an optimisation criterion such as to minimise the mean squared error of the prediction error J=E{e[n, k, I].sup.2}, with the prediction error:
e[n,k,I]={circumflex over (P)}.sub.d[n,k,I]{tilde over (P)}.sub.d[n,k,I].
The prediction model parameters can then be adapted, e.g., using a steepest decent method
.sub.n,k=.sub.n1,k.sub.J
with a fixed (or time variant) step-size determining the adaptation accuracy and tracking speed. Typically, since the expectation E{} cannot be calculated due to lack of knowledge of the statistics of the prediction error, a stochastic gradient decent method can be used, e.g., the least mean square (LMS) method
.sub.n,k=.sub.n1,k.sub.e[n,k,I].sup.2=.sub.n1,k+.sub.n,k,Ie[n,k].

(42) Advantageously, the adaptation considers only cases where the probability for a good noise power estimation is high, i.e., cases when it is relatively sure that speech is not present, since then the noise power was estimated accurately with high probability. For a AR-11 prediction model with
.sub.n,k,I.sup.T=[{circumflex over (P)}.sub.d[n,k1,I],{circumflex over (P)}.sub.d[n,k,I],{circumflex over (P)}.sub.d[n,k+1,I]]
the fixed step-size turns into a 33 diagonal time-variant step-size matrix,

(43) $Q_{n, k, I} = (\begin{matrix} p [n, k - 1, I] & 0 & 0 \\ 0 & p [n, k, I] & 0 \\ 0 & 0 & p [n, k + 1, I] \end{matrix}),$
incorporating the speech-absent probabilities. With this matrix step-size, the update equation reads
.sub.n,k=.sub.n1,k+Q.sub.n,k,I.sub.n,k,Ie[n,k],
thus restricting model adaptation more or less to speech-absent periods.

(44) Adaptation and iteration are interleaved with at least two possible methods. FIG. 9 shows a flow chart algorithm of prediction, step 901, estimation in step 902 and adaptation in step 905 where the adaptation is outside the iteration loop so that the model is adapted after all iteration steps I have been performed. Thus during the iterations within loop formed by steps 903 and 904, the model parameters are kept constant and the estimated noise power in the model {circumflex over (P)}.sub.d[n, k, i1] is updated from i1 to i. Finally at step 906 the time-instant is incremented and the algorithm for the next time-instant restarted.

(45) FIG. 10 shows a flow chart algorithm of prediction, step 1001, estimation in step 1002 and adaptation in step 1003 where the adaptation is inside the iteration loop formed by steps 1004 and 1005 so that the model is adapted at each iteration step. Thus, after iteration i, the model parameters are updated prior to the next iteration step at advancing to the next time-instant in step 1006. In this case, the prediction error is calculated based on the prediction and estimation of the noise power at the current iteration.

(46) Due to the recursive approach described above, changing noise power over time can be tracked with only a short delay. And due to the adaptation of the prediction model, the system is able to adapt to various acoustical situations, especially adaptation to various noise types. In addition, this approach is of relatively low arithmetical complexity compared to existing arrangements. Of course, due to the recursive approach, a system might become unstable for some unfavourable combination of parameters and input signal.

(47) Embodiments of the invention may be implemented in part by any conventional computer programming language. For example, preferred embodiments may be implemented in a procedural programming language (e.g., C) or an object oriented programming language (e.g., C++, Python). Alternative embodiments of the invention may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.

(48) Embodiments can be implemented in part as a computer program product for use with a computer system. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein with respect to the system. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software (e.g., a computer program product).

(49) Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.

Recursive noise power estimation with noise model adaptation

Assignee

Inventors

Cpc classification

Classification Explorer

H04R2225/67

ELECTRICITY

Classification Explorer

A61N1/36039

HUMAN NECESSITIES

Classification Explorer

A61N1/0541

HUMAN NECESSITIES

Classification Explorer

H04R25/50

ELECTRICITY

Classification Explorer

G10L21/0232

PHYSICS

Classification Explorer

H04R25/505

ELECTRICITY

Classification Explorer

G10L15/20

PHYSICS

Classification Explorer

A61N1/36036

HUMAN NECESSITIES

Classification Explorer

H04R2225/43

ELECTRICITY

International classification

Classification Explorer

H04R25/00

ELECTRICITY

Classification Explorer

G10L15/20

PHYSICS

Classification Explorer

G10L21/0232

PHYSICS

Classification Explorer

A61N1/05

HUMAN NECESSITIES

Classification Explorer

A61N1/36

HUMAN NECESSITIES

Abstract

Claims

Description