Recursive noise power estimation with noise model adaptation
10785581 · 2020-09-22
Assignee
Inventors
Cpc classification
H04R2225/67
ELECTRICITY
H04R25/50
ELECTRICITY
G10L15/20
PHYSICS
International classification
G10L15/20
PHYSICS
A61N1/05
HUMAN NECESSITIES
Abstract
A method of signal processing to generate hearing implant stimulation signals for a hearing implant system includes transforming an input sound signal into band pass signals each representing an associated frequency band of audio frequencies. The band pass signals are processed in a sequence of sampling time frames and iterative steps to produce a noise power estimate. This includes using a noise prediction model to determine if a currently observed signal sample includes a target signal, and if so, then updating a current noise power estimate without using the currently observed signal sample, and otherwise updating the current noise power estimate using the currently observed signal sample. The noise prediction model also is adapted based on the updated noise power estimate. The hearing implant stimulation signals are then developed from the band pass signals and the noise power estimate.
Claims
1. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein using the noise prediction model is based on a hard decision comparison of the currently observed signal sample to a variable threshold, the variable threshold representing a likelihood ratio test-statistic
2. The method according to claim 1, wherein updating the current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k] includes using the current signal power P.sub.y[n,k] and the estimated noise power from an immediately preceding time frame n1 and a last iteration step I, {circumflex over (P)}.sub.d[n1,k,I] so that the current noise power estimate {circumflex over (P)}.sub.d[n,k,i]={circumflex over (P)}.sub.d[n1,k,I]+(1)P.sub.y[n,k], where is a smoothing parameter.
3. The method according to claim 1, wherein updating the current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k] includes maintaining constant the current noise power estimate {circumflex over (P)}.sub.d[n,k,i].
4. The method according to claim 1, wherein updating the current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k] includes additionally using a weighted sum of neighboring noise power estimates, {circumflex over (P)}.sub.d[n,k,i]=(1){circumflex over (P)}.sub.d[n1,k,I]+P.sub.d[n1,k,I] with P.sub.d[n1,k,I]=.sub.lk.sup.Kw.sub.l,k{circumflex over (P)}.sub.d[n1,l,I] with suitably chosen weights w.sub.l,k and parameters a, b, m, .
5. The method according to claim 1, wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is performed after all I iterative steps for a given time frame n have been performed.
6. The method according to claim 1, wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is performed after each iteration i for a given time frame n.
7. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {tilde over (P)}.sub.d[n,k,I] for noise reduction of the band pass signals y.sub.k[n].
8. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {tilde over (P)}.sub.d[n,k,I] for channel selection of the band pass signals y.sub.k[n].
9. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {tilde over (P)}.sub.d[n,k,I] for a power saving functionality of the hearing implant system.
10. The method according to claim 1, wherein developing the hearing implant stimulation signals includes using the noise power estimate {circumflex over (P)}.sub.d[n,k,I] for channel selection of the band pass signals y.sub.k[n].
11. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein using the noise prediction model is based on a probability-based decision comparison of the currently observed signal sample P.sub.y[n,k] to a variable threshold [n,k,i], using a speech absence probability p[n,k,i] in an interval [0,1], where p[n,k,i]=g([n,k,i],P.sub.y[n,k]), so that the noise power estimate {circumflex over (P)}.sub.d[n, k, i] at iteration i, time frame n, and sub-band k is {tilde over (P)}.sub.d[n,k,i]=p[n,k,i]{circumflex over (P)}d,sa[n,k,i]+(1p[n,k,i]){circumflex over (P)}.sub.d,sp[n,k,i], ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {tilde over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {tilde over (P)}[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].
12. The method according to claim 11, wherein the speech absence probability p[n,k,i] is a sigmoidal function where
13. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein using the noise prediction model {tilde over (P)}.sub.d[n,k,i] is a time variant noise model that is a first order autoregressive model {tilde over (P)}[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,I]{circumflex over (P)}.sub.d[n,k,i1]) with model parameters =[.sub.1, .sub.2, . . . , .sub.M].sup.T, ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].
14. The method according to claim 13, wherein the noise prediction model {tilde over (P)}.sub.d[n,k,i] is based on estimates from neighboring sub-bands {tilde over (P)}.sub.d[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,I],{circumflex over (P)}.sub.d[n,k,i1],{circumflex over (P)}.sub.d[n1,lk,I],{circumflex over (P)}.sub.d[n,lk,i1]).
15. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein the noise prediction model {tilde over (P)}.sub.d[n,k,i] is a time variant noise model that is a linear autoregressive model of a linear combination of estimated noise power of a previous iteration and two directly neighboring sub-bands, {tilde over (P)}.sub.d[n,k,i]=.sub.i=k1.sup.k+1.sub.l[n,k]{circumflex over (P)}.sub.d[n,l,i1], where for i=1, {circumflex over (P)}.sub.d[n,k,0]={circumflex over (P)}.sub.d[n1,k,I], representing estimated noise power from previous time frame n1, ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {tilde over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].
16. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], wherein the noise prediction model {tilde over (P)}.sub.d[n,k,i] is a time variant noise model that is a linear autoregressive model of a linear combination of M already estimated noise powers and estimated noise power of a preceding iteration i1 and two L neighboring noise power estimates, {tilde over (P)}.sub.d[n,k,i]=l=k1.sup.k+1.sub.0l[n,k]{tilde over (P)}.sub.d[n,I,i1]+.sub.m=1.sup.M.sub.i=kL.sup.k+Lml[n,k]{circumflex over (P)}.sub.d[nm,l,I], where for i=1, {circumflex over (P)}.sub.d[n,k,0]=0, ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}[n,k,i] based on the updated noise power estimate {tilde over (P)}.sub.d[n,k,i]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate {circumflex over (P)}.sub.d[n,k,I].
17. A method of signal processing to generate hearing implant stimulation signals Z for a hearing implant system, the method comprising: transforming an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] and a non-information bearing noise signal d[n] into a plurality of band pass signals y.sub.k[n] each representing an associated frequency band of audio frequencies; processing the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n,k,I], wherein for each time frame n and iteration i, the processing includes: i. using a noise prediction model {tilde over (P)}.sub.d[n,k,i] to determine if a currently observed signal sample P.sub.y[n,k] includes the target signal s[n], ii. if the currently observed signal sample P.sub.y[n,k] includes the target signal s[n], then updating a current noise power estimate {circumflex over (P)}.sub.d[n,k,l] without using the currently observed signal sample P.sub.y[n,k], and otherwise iii. if the currently observed signal sample P.sub.y[n,k] does not include the target signal s[n], then updating a current noise power estimate {circumflex over (P)}[n,k,i] using the currently observed signal sample P.sub.y[n,k], wherein processing the band pass signals y.sub.k[n] further comprises adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] based on the updated noise power estimate {circumflex over (P)}.sub.d[n,k,i] wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is based on a continuous adaptation of one or more model optimization criteria the one or more model optimization criteria including minimizing a mean squared error J=E{e[n,k,I].sup.2} of a prediction error e[n,k,I]={circumflex over (P)}.sub.d[n,k,I]{tilde over (P)}.sub.d[n,k,I]; and developing the hearing implant stimulation signals Z from the band pass signals y.sub.k[n] and the noise power estimate P.sub.d[n,k,I].
18. The method according to claim 17, wherein adapting the noise prediction model {tilde over (P)}.sub.d[n,k,i] is based on adapting parameters of the noise prediction model {tilde over (P)}.sub.d[n,k,i] using a steepest descent method .sub.n,k=.sub.n1,k.sub.J with a fixed step size .
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) Embodiments of the present invention are directed to an improved approach to blind estimation of the noise power in an input sound signal y[n] characterized as an additive mixture of an information bearing target signal s[n] (e.g., speech) and a non-information bearing disturbing (noise) signal d[n]:y[n]=s[n]+d[n], where n is the time-index, referred to as the time frame. In particular, the problem of detecting time frames when the target signal s[n] is absent is addressed. In those time-frames, an estimate for the noise power can be updated by using the (observable) input sound signal y[n], since then y[n]=d[n]. The noise power estimate is recursively reused to update the prediction for the next estimation step. This approach differs from existing methods such as described in U.S. Pat. No. 8,385,572 in that no signal model is directly used in a noise power estimation algorithm.
(12) Estimating the noise power can be useful for a number of signal processing applications in a hearing implant system. These applications include: Noise reduction purposesSub-band signals with a poor signal-to-noise ratio (SNR) in a given time frame can be attenuated to improve the SNR, and thus users potentially enjoy better speech perception in noise. Cochlear implant (CI) signal codingSelecting only electrode channels with a high SNR or low noise power for stimulation can offer an improved hearing experience. Power saving strategiesDuring noise-only situations, the stimulation pattern can be changed to save power, e.g., by reducing the stimulation rate and/or amplitude.
(13)
(14) In such systems, the Noise Power Estimation Module 306 splits the estimation of the unknown noise power into three main steps: 1. PredictionFirst, the noise power is predicted for the current point in time using a model of the underlying noise process. Based on the prediction, a decision is made as to the presence or absence of speech. 2. EstimationUsing the speech presence decision, the current noise power estimate is updated. 3. AdaptationAnd the updated noise power estimate is used to update the noise prediction model that predicts the noise power for the next step.
It is assumed that the estimate will be closer to the true value of the noise power than the predicted value is. The increase in information about the unknown noise power after the estimation step is used to improve the noise model. Thus the prediction for the next step is improved, enabling a more accurate decision regarding speech presence or absence.
(15) Prediction and estimation can be performed several times for the same time point n, so that the Noise Power Estimation Module 306 processes the band pass signals y.sub.k[n] in a sequence of sampling time frames n and iterative steps i=1, . . . , I to produce a noise power estimate {circumflex over (P)}.sub.d[n, k, l]. For each time frame n and iteration i, the Noise Power Estimation Module 306 uses a noise prediction model {tilde over (P)}.sub.d[n, k, i] to determine if a currently observed signal sample P.sub.y[n, k] includes the target signal s[n]. If the currently observed signal sample P.sub.y[n, k] includes the target signal s[n], then a current noise power estimate {circumflex over (P)}.sub.d[n, k, i] is updated without using the currently observed signal sample P.sub.y[n, k]. Otherwise, if the currently observed signal sample P.sub.y[n, k] does not include the target signal s[n], then the current noise power estimate {circumflex over (P)}.sub.d[n, k, i] is updated using the currently observed signal sample P.sub.y[n, k]. The noise prediction model {tilde over (P)}.sub.d[n, k, i] also is adapted based on the updated noise power estimate {circumflex over (P)}.sub.d[n, k, i]. Performing multiple iterative steps increases the probability of a correct decision regarding speech presence or absence, and thus leads to a more accurate noise power estimate {circumflex over (P)}.sub.d[n, k, I].
(16) The observed target signal s[n] and noise signal d[n] are assumed to be realizations of locally stationary stochastic processes in which the statistics of the processes (e.g., represented by statistical moments such as mean and variance) are allowed to change slowly over time. For example, the signal powers are time-variant, but remain more or less constant within a short time window. The time window within which the noise process can be regarded as being stationary (i.e., the moments don't change) is assumed to be longer than that of the target (speech) process. In addition, it is assumed that the noise and speech processes are statistically independent with zero mean. Using the second assumption the signal power is P.sub.y=E{(s+d).sup.2}=E(s.sup.2)+E(d.sup.2)=P.sub.s+P.sub.d. That is, simply the addition of the speech power and noise power, where E{} denotes statistical expectation.
(17) Typically, the input sound signal y[n] is decomposed into a number of sub-bands using, e.g., a filter bank (time domain, DFT, other subspaces, . . . ): y.sub.k[n]=FB(y[n]), k=1, . . . , K. The processing is typically performed per time and sub-band. If not needed, time and sub-band indices are suppressed in the following. Since the expectation operation cannot be performed in a real implementation, it is typically approximated using an average over time, e.g., by using a low pass filter. The estimated signal power is then P.sub.y=(s+d).sup.2
=
s.sup.2
+
d.sup.2
=P.sub.s+P.sub.d, where
denotes averaging over time. Either the squared signal as stated above or, equivalently, the squared envelope is used. For speech processing applications the low pass filter has typically a 6 dB cut-off frequency of approximately 5-50 Hz, which comprises the speech modulations. After low pass filtering, a sampling rate decimation to a significantly lower sampling rate (e.g., 80-100 Hz) can be applied in order to reduce the computational complexity of the following stages.
(18)
(19) More specifically, the hypothesis test at iteration i is a simple comparison of the current sample P.sub.y against a variable threshold :
(20) P.sub.y[n, k][n, k, i]: P.sub.y[n, k] consists of noise only (null-hypothesis H.sub.0)
(21) P.sub.y[n, k]>[n, k, i]: consists of noise and speech (hypothesis H.sub.1).
(22) The noise power estimate {circumflex over (P)}.sub.d[n, k, i] is then constructed based on the hypothesis-test decision. Recursive smoothing over time n and/or sub-band k may also be applied by which the correlation of the noise power over time and/or sub-bands is taken into account. If the hypothesis test indicates that the speech signal s[n] is absent (null-hypothesis H.sub.0), then the noise power estimate {circumflex over (P)}.sub.d[n, k, i] is updated using the current signal sample P.sub.y[n, k] and the estimated noise power from time point n1 and the last iteration step I, {circumflex over (P)}.sub.d[n1, k, I]:
{circumflex over (P)}.sub.d,sa[n,k,i]={circumflex over (P)}.sub.d[n1,k,I]+(1)P.sub.y[n,k]
Using a hard threshold decision, the noise power estimate is then:
P.sub.y[n,k][n,k,i]:{circumflex over (P)}.sub.d[n,k,i]={circumflex over (P)}.sub.d,sa[n,k,i].
If the null-hypothesis is rejected (speech is present), the noise power estimate {circumflex over (P)}.sub.d[n, k, i] is kept constant, i.e.,
{circumflex over (P)}.sub.d,sp[n,k,i]={circumflex over (P)}.sub.d[n1,k,I].
The update of the noise power estimate is then
P.sub.y[n,k]>[n,k,i]:{circumflex over (P)}.sub.d[n,k,i]={circumflex over (P)}.sub.d,sp[n,k,i].
(23) Alternatively, in the case of speech present, the noise power estimate {circumflex over (P)}.sub.d[n, k, i] can be updated using additionally a weighted sum of neighbouring noise power estimates,
(24)
with suitably chosen weights w.sub.l,k, e.g.,
w.sub.l,k= exp(b|lk|.sup.m)
and suitably chosen parameters a, b, m. With this weighting, distant sub-bands contribute less than neighbouring sub-bands, reflecting, e.g., a decrease of the correlation if the distance in frequency increases. The weights w.sub.l,k and/or the parameters a, b, m can also be estimated and updated continuously using already existing noise power estimates from time frames before n or from time frame n and previous iterations i. The smoothing parameters (in case speech absent) and (speech present) determine the degree of influence of the noise power estimate from time frame n1 and model in a simple manner the correlation of the noise power over time.
(25) Instead of a hard threshold decision as described above, a soft threshold decision could be used and might be advantageous since errors regarding the decision of speech absence or presence would have less weight. The output of the comparison with the threshold is defined as speech absence probability. A decision
p[n,k,i]=g([n,k,i],P.sub.y[n,k]),
with a suitable function g () providing (soft) values for the speech-absent probability in the interval [0,1] can be used. E.g., a sigmoidal function
(26)
and .sub.k determining the steepness of the function. A hard decision is achieved for the limit case .sub.k.fwdarw.. Using the speech absence probability p[n, k, i], the noise power estimate at iteration i, time frame n, and sub-band k is then
{circumflex over (P)}.sub.d[n,k,i]=p[n,k,i]{circumflex over (P)}.sub.d,sa[n,k,i]+(1p[n,k,i]){circumflex over (P)}.sub.d,sp[n,k,i],
with the speech-presence probability 1p[n, k, i]. For the first simple case described above, the noise power estimate is then
(27)
with a scaled speech-absence probability {tilde over (p)}[n, k, i]=p[n, k, i](1).
(28) The threshold can be derived using a stochastic signal model that treats the involved signals P.sub.y, P.sub.s, P.sub.d as stochastic processes, using a likelihood ratio test-statistic (Neyman, J., Pearson, E., On the problem of the most efficient test of statistical hypotheses, Philosophical Transactions of the Royal Society of London, Series A, Containing Papers of a Mathematical or Physical Character 231, pp. 289-337, 1933; incorporated herein by reference in its entirety):
(29)
where f.sub.P.sub.
p.sub.FA=p[(P.sub.y)>|H.sub.0]=.sub.{P.sub.
With this equation, the threshold for a given false-alarm probability can be determined.
(30) The threshold is a function of the unknown noise power P.sub.d since P.sub.y=P.sub.s+P.sub.d. In order to be able to calculate a threshold, a prediction {tilde over (P)}.sub.d[n] of the unknown noise power for time n as discussed below is used. This yields for the threshold [n]=(p.sub.FA, {tilde over (P)}.sub.d[n]) where the function () depends on the assumed probability density f.sub.P.sub.
(31) The key for an accurate estimation of the noise power is a correct decision whether the currently observed sample P.sub.y[n] results from speech and noise or noise only. This decision is based on the threshold calculation and depends on the targeted false-alarm probability and the noise-power. Since the noise-power is unknown and the aim of the process, the threshold cannot be calculated directly. Instead, a predicted value for the unknown noise power based on a time-variant noise-model can be used based on previous noise power estimates {circumflex over (P)}.sub.d[n1, k, I]{circumflex over (P)}.sub.d[n2, k, I], . . . as well as estimates produced at previous iteration steps, i.e., {circumflex over (P)}.sub.d[n, k, i1], {circumflex over (P)}.sub.d[n, k, i2], . . . . A prediction for the noise power for the current iteration step then can be made by using, e.g., an auto regressive model of first order (AR-1):
{tilde over (P)}.sub.d[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,l],{circumflex over (P)}.sub.d[n,k,i1]),
where =[.sub.1, .sub.2, . . . , .sub.M].sup.T are the model parameters. In some specific embodiments, estimates from neighbouring sub-bands can be used in the prediction model, too:
{tilde over (P)}.sub.d[n,k,i]=f(,{circumflex over (P)}.sub.d[n1,k,I],{circumflex over (P)}.sub.d[n,k,i1],{circumflex over (P)}.sub.d[n1,l k,I],{circumflex over (P)}.sub.d[n,lk,i1]).
(32) The prediction model parameters for the noise power are adapted to increase the accuracy of succeeding predictions. This is done by using the final estimate for the noise power at time n and iteration-end I, {circumflex over (P)}.sub.d[n, k, I] and the prediction {tilde over (P)}[n, k, I]. Specifically, the difference between the two gives information about the mismatch between the model and the actual noise process, and is used for adapting the model parameters. Since the model is adapted, the parameters are changing over time, i.e., the (linear or nonlinear) model itself changes over time. The adaptation rule as described further below defines how the parameters are adapted to the current situation.
(33) For predicting the noise-power, various different specific models can be used; for example, a linear AR-11 model in which the predicted noise power is a linear combination of the estimated noise power of the previous iteration and two directly neighbouring sub-bands:
(34)
whereby for i=1, {circumflex over (P)}.sub.d[n, k, 0]={circumflex over (P)}.sub.d[n1, k, I], i.e., the estimate from the previous time frame n1. Or a linear AR-ML model could be employed where the predicted noise power is a linear combination of M already estimated noise powers and the estimated noise power of the last iteration, as well as 2 L neighbouring noise power estimates:
(35)
Or a nonlinear model could be used where the predicted noise power is a nonlinear function with respect to the estimated noise powers, in which case, many different alternatives can be implemented, such as a recursive polynomial model.
(36) For a linear-in-the-parameters prediction model, the model parameters can be condensed into a vector and the prediction is written as {tilde over (P)}.sub.d[n, k, i]=.sub.n,k,i.sup.T.sub.n,k. For a linear AR-11 model:
.sub.n,k,i.sup.T=[{circumflex over (P)}.sub.d[n,k1,i1],{circumflex over (P)}.sub.d[n,k,i1],{circumflex over (P)}.sub.d[n,k+1,i1]]
and:
.sub.n,k.sup.T=[.sub.1[n,k],.sub.0[n,k],.sub.+1[n,k]].
(37)
(38) Two cases, reflecting two situations prone for a false decision for speech presence or absence can be briefly considered. In a case where there is a rising noise power and speech is absent, then it is likely that it might be decided for speech presence due to the increasing signal power. If at time frame n, sub-band k, iteration i=1 it was erroneously decided for speech presence, the noise power estimate is not updated and will not follow the increasing noise power, i.e., it will be too small. If in the neighbouring sub-bands k1, k+1 the decision is correct, the estimates for the noise power are updated correctly and increase. In the next iteration step, the prediction for the noise power in sub-band k is based on the updated noise power estimates in the neighbouring sub-bands and will increase also, assuming the noise model is sufficiently accurate. The probability for a correct speech presence or absence decision at this iteration step is increased now since the noise power prediction will be more accurate, and thus the decision for speech absence more likely, resulting in a larger probability for an update of the noise power estimate.
(39) In a different case where there is a falling noise power and speech is present, it is likely that it might be decided for speech absent due to the decreasing signal level. That is, it might happen that at time frame n, sub-band k, iteration i=1, it is decided for speech absent. The noise power then will be updated erroneously. Assuming correct decisions and updates in the neighbouring sub-bands, i.e., decreasing noise power estimates there, at iteration i=2 it might be decided for speech presence, leading to a correct update of the noise power.
(40) With this method, the speech absence probability is iteratively calculated, and, due to the correlation across sub-bands, it is assumed that a false decision at one iteration step is corrected in one of the following steps.
(41)
e[n,k,I]={circumflex over (P)}.sub.d[n,k,I]{tilde over (P)}.sub.d[n,k,I].
The prediction model parameters can then be adapted, e.g., using a steepest decent method
.sub.n,k=.sub.n1,k.sub.J
with a fixed (or time variant) step-size determining the adaptation accuracy and tracking speed. Typically, since the expectation E{} cannot be calculated due to lack of knowledge of the statistics of the prediction error, a stochastic gradient decent method can be used, e.g., the least mean square (LMS) method
.sub.n,k=.sub.n1,k.sub.e[n,k,I].sup.2=.sub.n1,k+.sub.n,k,Ie[n,k].
(42) Advantageously, the adaptation considers only cases where the probability for a good noise power estimation is high, i.e., cases when it is relatively sure that speech is not present, since then the noise power was estimated accurately with high probability. For a AR-11 prediction model with
.sub.n,k,I.sup.T=[{circumflex over (P)}.sub.d[n,k1,I],{circumflex over (P)}.sub.d[n,k,I],{circumflex over (P)}.sub.d[n,k+1,I]]
the fixed step-size turns into a 33 diagonal time-variant step-size matrix,
(43)
incorporating the speech-absent probabilities. With this matrix step-size, the update equation reads
.sub.n,k=.sub.n1,k+Q.sub.n,k,I.sub.n,k,Ie[n,k],
thus restricting model adaptation more or less to speech-absent periods.
(44) Adaptation and iteration are interleaved with at least two possible methods.
(45)
(46) Due to the recursive approach described above, changing noise power over time can be tracked with only a short delay. And due to the adaptation of the prediction model, the system is able to adapt to various acoustical situations, especially adaptation to various noise types. In addition, this approach is of relatively low arithmetical complexity compared to existing arrangements. Of course, due to the recursive approach, a system might become unstable for some unfavourable combination of parameters and input signal.
(47) Embodiments of the invention may be implemented in part by any conventional computer programming language. For example, preferred embodiments may be implemented in a procedural programming language (e.g., C) or an object oriented programming language (e.g., C++, Python). Alternative embodiments of the invention may be implemented as pre-programmed hardware elements, other related components, or as a combination of hardware and software components.
(48) Embodiments can be implemented in part as a computer program product for use with a computer system. Such implementation may include a series of computer instructions fixed either on a tangible medium, such as a computer readable medium (e.g., a diskette, CD-ROM, ROM, or fixed disk) or transmittable to a computer system, via a modem or other interface device, such as a communications adapter connected to a network over a medium. The medium may be either a tangible medium (e.g., optical or analog communications lines) or a medium implemented with wireless techniques (e.g., microwave, infrared or other transmission techniques). The series of computer instructions embodies all or part of the functionality previously described herein with respect to the system. Those skilled in the art should appreciate that such computer instructions can be written in a number of programming languages for use with many computer architectures or operating systems. Furthermore, such instructions may be stored in any memory device, such as semiconductor, magnetic, optical or other memory devices, and may be transmitted using any communications technology, such as optical, infrared, microwave, or other transmission technologies. It is expected that such a computer program product may be distributed as a removable medium with accompanying printed or electronic documentation (e.g., shrink wrapped software), preloaded with a computer system (e.g., on system ROM or fixed disk), or distributed from a server or electronic bulletin board over the network (e.g., the Internet or World Wide Web). Of course, some embodiments of the invention may be implemented as a combination of both software (e.g., a computer program product) and hardware. Still other embodiments of the invention are implemented as entirely hardware, or entirely software (e.g., a computer program product).
(49) Although various exemplary embodiments of the invention have been disclosed, it should be apparent to those skilled in the art that various changes and modifications can be made which will achieve some of the advantages of the invention without departing from the true scope of the invention.