KALMAN FILTERING BASED SPEECH ENHANCEMENT USING A CODEBOOK BASED APPROACH
20170265010 · 2017-09-14
Assignee
Inventors
- Mathew Shaji Kavalekalam (Ballerup, DK)
- Mads Graesboll Christensen (Ballerup, DK)
- Fredrik Gran (Ballerup, DK)
- Jesper B. Boldt (Ballerup, DK)
Cpc classification
H04R2201/107
ELECTRICITY
International classification
Abstract
A hearing device for enhancing speech intelligibility, the hearing device includes: an input transducer for providing an input signal comprising a speech signal and a noise signal; a processing unit; an acoustic output transducer coupled to the processing unit, the acoustic output transducer configured to provide an audio output signal based on an output signal form the processing unit; wherein the processing unit is configured to determine one or more parameters of the input signal based on a codebook based approach (CBA) processing; and wherein the processing unit is configured to perform a Kalman filtering of the input signal based on the determined one or more parameters so that the output signal has an enhanced speech intelligibility.
Claims
1. A hearing device for enhancing speech intelligibility, the hearing device comprising: an input transducer for providing an input signal comprising a speech signal and a noise signal; a processing unit; an acoustic output transducer coupled to the processing unit, the acoustic output transducer configured to provide an audio output signal based on an output signal form the processing unit; wherein the processing unit is configured to determine one or more parameters of the input signal based on a codebook based approach (CBA) processing; and wherein the processing unit is configured to perform a Kalman filtering of the input signal based on the determined one or more parameters so that the output signal has an enhanced speech intelligibility.
2. The hearing device according to claim 1, wherein the input signal is divided into one or more frames, the one or more frames comprising primary frames representing speech signals, secondary frames representing noise signals, tertiary frames representing silence, or any combination of the foregoing.
3. The hearing device according to claim 1, wherein the one or more parameters comprise short term predictor (STP) parameters.
4. The hearing device according to claim 1, wherein the one or more parameters comprise one or a combination of: a first parameter being a state evolution matrix C(n) comprising of speech Linear Prediction Coefficients (LPC) and noise Linear Prediction Coefficients (LPC), a second parameter being a variance of a speech excitation signal σ.sub.u.sup.2(n), and a third parameter being a variance of a noise excitation signal σ.sub.v.sup.2(n).
5. The hearing device according to claim 1, wherein the one or more parameters are assumed to be constant over frames of 25 milliseconds.
6. The hearing device according to claim 1, wherein the processing unit is configured to determine the one or more parameters based on a priori information about speech spectral shapes and/or noise spectral shapes stored in a codebook in a form of Linear Prediction Coefficients (LPC).
7. The hearing device according to claim 1, wherein the codebook based approach (CBA) processing involves a generic speech codebook or a speaker specific trained codebook.
8. The hearing device according to claim 1, wherein the code book based approach (CBA) processing involves a speaker specific trained codebook, and wherein the speaker specific trained codebook comprises data based on recording speech of multiple persons.
9. The hearing device according to claim 1, wherein the processing unit is configured to automatically select a codebook for the codebook based approach (CBA) processing from a plurality of available codebooks, and wherein the processing unit is configured to automatically select the codebook based on a spectra of the input signal and/or based on a measurement of short term objective intelligibility (STOI) for each of the available codebooks.
10. The hearing device according to claim 1, wherein the processing unit is configured to perform the Kalman filtering using a fixed lag Kalman smoother that is configured to provide a minimum mean-square estimator (MMSE) of the speech signal.
11. The hearing device according to claim 1, wherein the processing unit is configured to perform the Kalman filtering of the input signal by computing an a priori estimate and an a posteriori estimate of a state vector, and an error covariance matrix of the input signal.
12. The hearing device according to claim 1, wherein the processing unit is configured to perform a weighted summation of short term predictor (STP) parameters of the speech signal in a line spectral frequency (LSF) domain.
13. The hearing device according to claim 1, wherein the hearing device is a first hearing device configured to communicate with a second hearing device in a binaural hearing device system configured to be worn by a user.
14. The hearing device according to claim 13, wherein the input transducer comprises a first input transducer, the input signal comprises a left ear input signal, and wherein the first hearing device comprises the first input transducer for providing the left ear input signal; wherein the second hearing device comprises a second input transducer for providing a right ear input signal comprising a right ear speech signal and a right ear noise signal; wherein the processing unit comprises a first processing unit, the one or more parameters of the input signal comprises one or more left parameters of the left ear input signal, and wherein the first hearing device comprises the first processing unit configured for determining the one or more left parameters of the left ear input signal based on the codebook based approach (CBA) processing; and wherein the second hearing device comprises a second processing unit configured for determining one or more right parameters of the right ear input signal.
15. A method for enhancing speech intelligibility in a hearing device, the method comprising: providing an input signal comprising a speech signal and a noise signal; determining, using a processing unit, one or more parameters of the input signal based on a codebook based approach (CBA) processing; performing, using the processing unit, a Kalman filtering of the input signal based on the determined one or more parameters to generate an output signal that has an enhanced speech intelligibility; and providing an audio output signal by an acoustic output transducer based on the output signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0052] The above and other features and advantages will become readily apparent to those skilled in the art by the following detailed description of exemplary embodiments thereof with reference to the attached drawings, in which:
[0053]
[0054]
[0055]
[0056]
[0057]
DETAILED DESCRIPTION
[0058] Various embodiments are described hereinafter with reference to the figures. Like reference numerals refer to like elements throughout. Like elements will, thus, not be described in detail with respect to the description of each figure. It should also be noted that the figures are only intended to facilitate the description of the embodiments. They are not intended as an exhaustive description of the claimed invention or as a limitation on the scope of the claimed invention. In addition, an illustrated embodiment needs not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced in any other embodiments even if not so illustrated, or if not so explicitly described.
[0059] Throughout, the same reference numerals are used for identical or corresponding parts.
[0060]
[0061] The hearing device 2 comprises an input transducer 4, such as a microphone, for providing an input signal z(n) or noisy signal z(n) comprising a speech signal (s(n) and a noise signal w(n).
[0062] The hearing device 2 comprises a processing unit 6 configured for processing the input signal z(n).
[0063] The hearing device 2 comprises an acoustic output transducer 8, such as a receiver or loudspeaker, coupled to an output of the processing unit 6 for conversion of an output signal form the processing unit 6 into an audio output signal.
[0064] The processing unit 6 is configured for performing a codebook based approach processing on the input signal z(n).
[0065] The processing unit 6 is configured for determining one or more parameters of the input signal z(n) based on the codebook based approach processing.
[0066] The processing unit 6 is configured for performing a Kalman filtering of the input signal z(n) using the determined one or more parameters.
[0067] The processing unit 6 is configured to provide that the output signal is speech intelligibility enhanced due to the Kalman filtering.
[0068] The present hearing device and method relate to a speech enhancement framework based on Kalman filter. The Kalman filtering for speech enhancement may be for white background noise, or for coloured noise where the speech and noise short term predictor (STP) parameters required for the functioning of the Kalman filter is estimated using an approximated estimate-maximize algorithm. The present hearing device and method uses a codebook-based approach for estimating the speech and noise short term predictor (STP) parameters. Objective measures such as short term objective intelligibility (STOI) and Segmental SNR (SegSNR) have been used in the present hearing device and method to evaluate the performance of the enhancement algorithm in presence of babble noise. The effects of having a speaker specific trained codebook over a generic speech codebook on the performance of the algorithm have been investigated for the present hearing device and method. In the following, the signal model and the assumptions that are used will be explained. The speech enhancement framework will be explained in detail. Experiments and results will also be presented.
[0069] The signal model and assumptions that will be used is now presented. It is assumed that a speech signal s(n) also called a clean speech signal s(n) is additively interfered with a noise signal w(n) to form the input signal z(n) also called the noisy signal z(n) according to the equation:
s(n)=s(n)+w(n) ∀n=1,2 . . . (1)
[0070] It may also be assumed that the noise and speech are statistically independent or uncorrelated with each other. The clean speech signal s(n) may be modelled as a stochastic autoregressive (AR) process represented by the equation:
where
a(n)=[a.sub.1(n),a.sub.2(n), . . . a.sub.P(n)].sup.T
is a vector containing the speech Linear Prediction Coefficients (LPC), s(n−1)=[s(n−1), . . . s(n−P)].sup.T, P is the order of the autoregressive (AR) process corresponding to the speech signal and u(n) is a white Gaussian noise (WGN) with zero mean and excitation variance σ.sup.2.sub.u(n).
[0071] The noise signal may also be modelled as an autoregressive (AR) process according to the equation
where
b(n)=[b.sub.1(n),b.sub.2(n), . . . b.sub.Q(n)].sup.T
is a vector containing noise Linear Prediction Coefficients (LPC), w(n−1)=[w(n−1), . . . w(n−Q)].sup.T, Q is the order of the autoregressive (AR) process corresponding to the noise signal and v(n) is a white Gaussian noise (WGN) with zero mean and excitation variance σ.sup.2.sub.v(n). Linear Prediction Coefficients (LPC) along with excitation variance generally constitutes the short term predictor (STP) parameters.
[0072] In the present hearing device and method a single channel speech enhancement technique based on Kalman filtering may be used. A basic block diagram of the speech enhancement framework is shown in
[0073]
[0074] In step 101 the method comprises providing an input signal z(n) comprising a speech signal and a noise signal.
[0075] In step 102 the method comprises performing a codebook based approach processing on the input signal z(n).
[0076] In step 103 the method comprises determining one or more parameters of the input signal z(n) based on the codebook based approach processing in step 102. The parameters may be short term predictor (STP) parameters.
[0077] In step 104 the method comprises performing a Kalman filtering of the input signal z(n) using the determined one or more parameters from step 103.
[0078] In step 105 the method comprises providing that an output signal is speech intelligibility enhanced due to the Kalman filtering in step 104.
Kalman Filter for Speech Enhancement:
[0079] The Kalman filter enables us to estimate the state of a process governed by a linear stochastic difference equation in a recursive manner. It may be an optimal linear estimator in the sense that it minimises the mean of the squared error. This section explains the principle of a fixed lag Kalman smoother with a smoother delay d≧P. The Kalman smoother may provide the minimum mean square error (MMSE) estimate of the speech signal s(n) which can be expressed as
{circumflex over (s)}(n)=E(s(n)|z(n+d), . . . ,z(1)) ∀n=1,2 . . . (4)
[0080] The usage of Kalman filter from a speech enhancement perspective may require the autoregressive (AR) signal model in eq. (2) to be written as a state space as shown below
s(n)=A(n)s(n−1)+Γ.sub.1u(n), (5)
where the state vector s(n)=[s(n)s(n−1) . . . s(n−d)].sup.T is a (d+1)×1 vector containing the d+1 recent speech samples, Γ.sub.1=[1, 0 . . . 0].sup.T is a (d+1)×1 vector and A(n) is the (d+1)x(d+1) speech state evolution matrix as shown below
[0081] Analogously, the autoregressive (AR) model for the noise signal w(n) shown in (3) can be written in the state space form as
w(n)=B(n)w(n−1)+Γ.sub.2v(n), (7)
where the state vector w(n)=[w(n)w(n−1) . . . w(n−Q+.sub.1)].sup.T is a Q×1 vector containing the Q recent noise samples, Γ.sub.2=[1, 0 . . . 0].sup.T is a Q×1 vector and B(n) is the Q×Q noise state evolution matrix as shown below
[0082] The state space equations in eq. (5) and eq. (7) may be combined together to form a concatenated state space equation as shown in (9)
which may be rewritten as
x(n)=C(n)x(n−1)+Γ.sub.3y(n), (10)
where x(n) is the concatenated state space vector, C(n) is the concatenated state evolution matrix,
[0083] Consequently, eq. (1) can be rewritten as
z(n)=Γ.sup.Tx(n), (11)
where
[0084] Γ=[Γ.sub.1.sup.TΓ.sub.2.sup.T].sup.T
[0085] The final state space equation and measurement equation denoted by eq. (10) and eq. (11) respectively, may subsequently be used for the formulation of the Kalman filter equations (eq. 12-eq. 17), see below. The prediction stage of the Kalman smoother denoted by equations eq. (12) and eq. (13) may compute the a priori estimates of the state vector
{circumflex over (x)}(n|n−1)
and error covariance matrix
M(n|n−1)
respectively
[0086] The Kalman gain may be computed as shown in eq. (14)
K(n)=M(n|n−1)Γ[Γ.sup.TM(n|n−1)Γ].sup.−1. (14)
[0087] The correction stage of the Kalman smoother which computes the a posteriori estimates of the state vector and error covariance matrix may be written as
{circumflex over (x)}(n|n)={circumflex over (x)}(n|n−1)+K(n)[z(n)−Γ.sup.T{circumflex over (x)}(n|n−1)] (15)
M(n|n)=(I−K(n)Γ.sup.T)M(n|n−1). (16)
[0088] Finally, the enhanced output signal ŝ using a Kalman smoother at time index n−d may be obtained by taking the d+1.sup.th entry of the a posteriori estimate of the state vector as shown in eq. (17)
{circumflex over (s)}(n−d)={circumflex over (x)}.sub.d=1(n|n). (17)
[0089] In case of a Kalman filter, d+1=P and the enhanced signal ŝ at time index n may be obtained by taking the first entry of the a posteriori estimate of the state vector as shown below
{circumflex over (s)}(n)={circumflex over (x)}.sub.1(n|n).
Codebook Based Estimation of Autoregressive STP Parameters:
[0090] The usage of a Kalman filter from a speech enhancement perspective as explained above may require the state evolution matrix C(n), consisting of the speech Linear Prediction Coefficients (LPC) and noise Linear Prediction Coefficients (LPC), variance of speech excitation signal σ.sup.2.sub.u(n) and variance of the noise excitation signal σ.sup.2.sub.v(n) to be known. These parameters may be assumed to be constant over frames of 20-25 milliseconds (ms) due to the quasi-stationary nature of speech. This section explains the minimum mean square error (MMSE) estimation of these parameters using a codebook based approach. This method may use the a priori information about speech and noise spectral shapes stored in trained codebooks in the form of Linear Prediction Coefficients (LPC). The parameters to be estimated may be concatenated to form a single vector
θ=[a;b;σ.sub.u.sup.2;σ.sub.u.sup.2].
[0091] The minimum mean square error (MMSE) estimate of the parameter 9 may be written as
{tilde over (θ)}=E(θ|z), (18)
where z denotes a frame of noisy samples. Using the Bayes theorem, eq. (19) can be rewritten as
where Θ denotes the support space of the parameters to be estimated. Let us define
θ.sub.iĵ=[a.sub.i;b.sub.j;σ.sub.u,ij.sup.̂2,ML;σ.sub.u,ij.sup.2,ML]
where a.sub.i is the i.sup.th entry of speech codebook (of size N.sub.s), b.sub.j is the j.sup.th entry of the noise codebook (of size N.sub.w) and
σ.sub.u,ij.sup.2,ML,σ.sub.u,ij.sup.2,ML
represents the maximum likelihood (ML) estimates of speech and noise excitation variances which depends on a.sub.i, b.sub.j and z. Maximum likelihood (ML) estimates of speech and noise excitation variances may be estimated according to the following equation,
is the spectral envelope corresponding to the i.sup.th entry of the speech codebook,
is the spectral envelope corresponding to the j.sup.th entry of the noise codebook and P.sub.z(ω) is the spectral envelope corresponding to the noisy signal z(n). Consequently, a discrete counterpart to eq. (20) can be written as
where the minimum mean square error (MMSE) estimate may be expressed as a weighted linear combination of θ.sub.ij with weights proportional to
p(z|θ.sub.ij)
which may be computed according to the following equations
where
d.sub.IS(P.sub.z(ω),{circumflex over (P)}.sub.2.sup.ij(ω))
is the Itakura Saito distortion between the noisy spectrum and the modelled noisy spectrum. It should be noted that the weighted summation of autoregressive (AR) parameters in eq. (23) preferably is to be performed in the line spectral frequency (LSF) domain rather than in the Linear Prediction Coefficients (LPC) domain. Weighted summation in the line spectral frequency (LSF) domain may be guaranteed to result in stable inverse filters which are not always the case in Linear Prediction Coefficients (LPC) domain.
Experiments:
[0092] This section describes the experiments performed to evaluate the speech enhancement framework explained above. Objective measures, that have been used for evaluation are short term objective intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Segmental signal-to-noise ratio (SegSNR). The test set for this experiment consisted of speech from four different speakers: two male and two female speakers from the CHiME database resampled to 8 KHz. The noise signal used for simulations is multi-talker babble from the NOIZEUS database. The speech and noise STP parameters required for the enhancement procedure is estimated every 25 ms as explained above. Speech codebook used for the estimation of STP parameters may be generated using the Generalised Lloyd algorithm (GLA) on a training sample of 10 minutes of speech from the TIMIT database. The noise codebook may be generated using two minutes of babble. The order of the speech and noise AR model may be chosen to be 14. The parameters that have been used for the experiments are summarised in Table 1 below.
TABLE-US-00001 TABLE 1 Experimental setup fs Frame Size N.sub.s N.sub.w P Q 8 Khz 160(20 ms) 128 12 10 10
[0093] The estimated short term predictor (STP) parameters are subsequently used for enhancement by a fixed lag Kalman smoother (with d=40). The effects of having a speaker specific codebook instead of a generic speech codebook are also investigated here. The speaker specific codebook may generated by Generalised Lloyd algorithm (GLA) using a training sample of five minutes of speech from the specific speaker of interest. The speech samples used for testing were not included in the training set. A speaker codebook size of 64 entries was empirically noted to be sufficient. The system of Kalman smoother, utilising a speech codebook and speaker codebook for the estimation of short term predictor (STP) parameters is denoted as KS-speech model and KS-speaker model respectively. The results are compared with Ephraim-Malah (EM) method and state of the art minimum mean square error (MMSE) estimator based on generalised gamma priors (MMSE-GGP).
[0094]
[0095] Thus it is an advantage to provide a hearing device and a method of speech enhancement based on Kalman filter, and where the parameters required for the functioning of Kalman filter were estimated using a codebook based approach. Objective measures such as short term objective intelligibility (STOI), Segmental signal-to-noise ratio (SegSNR) and Perceptual Evaluation of Speech Quality (PESQ) were used to evaluate the performance of the method in presence of babble noise. Experimental results indicate that the presented method was able to increase the speech quality and speech intelligibility according to the objective measures. Moreover, it was noted that having a speaker specific trained codebook instead of a generic speech codebook can show upto 6% increase in short term objective intelligibility (STOI) scores.
Binaural Hearing System
[0096] This section regards the estimation of speech and noise short term predictor (STP) parameters using codebook based approach when we have access to binaural noisy signals, i.e. input signals. The estimated short term predictor (STP) parameters may be further used for enhancement of the binaural noisy signals. In the following first the signal model and the assumptions that will be used are introduces. Then the estimation of short term predictor (STP) parameters in a binaural scenario is explained and the experimental results are discusses.
Signal Model:
[0097] The binaural noisy signals or input signals at the left and right ears are denoted by zl(n) and zr(n) respectively. Noisy signal at the left ear zl(n) is expressed as shown in eq. (27), where sl(n) is the clean speech component and wl(n) is the noise component at the left ear.
z.sub.1(n)=s.sub.1(n)+w.sub.1(n) ∀n=1,2 . . .
[0098] The noisy signal at the right ear is expressed similarly as shown in eq. (28)
z.sub.r(n)=s.sub.r(n)+w.sub.r(n) ∀n=1,2 . . . .
[0099] It may be further assumed that the speech signal and noise signal can be represented as autoregressive (AR) process. It may be assumed that the speech source is in front of the listener i.e. the user of the hearing device, and it may thus be assumed that the clean speech component at the left and right ears is represented by the same autoregressive (AR) process. The noise component at the left and right ears may also be assumed to be represented by the same autoregressive (AR) process. The short term predictor (STP) parameters corresponding to an autoregressive (AR) process may constitute of the linear prediction coefficients (LPC) and the variance of the excitation signal. The short term predictor (STP) parameters corresponding to speech may be represented as
θ.sub.s=[aσ.sub.u.sup.2],
where a is the vector of linear prediction coefficients (LPC) coefficients and
σ.sub.u.sup.2
is the excitation variance corresponding to the speech autoregressive (AR) process. Analogously, the short term predictor (STP) parameters corresponding to the noise autoregressive (AR) process may be represented as
θ.sub.w=[bσ.sub.u.sup.2].
Method:
[0100] An objective here is to estimate the short term predictor (STP) parameters corresponding to the speech and noise autoregressive (AR) process given the binaural noisy signal or input signals. Let us denote the parameters to be estimated as
θ=[θ.sub.sθ.sub.w].
[0101] The minimum mean-square error (MMSE) estimate of the parameter θ is written as eq. (29) and (30):
[0102] Let us define
θ.sub.ij=[a.sub.i;σ.sub.u,ij.sup.2,ML;b.sub.j;σ.sub.v,ij.sup.2,ML]
where ai is the l'th entry of speech codebook (of size Ns), bj is the j'th entry of the noise codebook (of size Nw) and
σ.sub.u,ij.sup.2,ML,σ.sub.v,ij.sup.2,ML
represents the maximum likelihood (ML) estimates of the excitation variances. The discrete counterpart of (30) is written as eq (31):
[0103] Weight of the i,j'th codebook combination is determined by
p(z.sub.l,z.sub.r|θ.sub.ij).
[0104] Assuming that modeling errors for the left and right noisy signal or input signal is conditionally independent,
P(z.sub.l,z.sub.r|θ.sub.ij).
can be written as eq (32):
p(z.sub.l,z.sub.r|θ.sub.ij)=p(z.sub.l|θ.sub.ij)p(z.sub.r|θ.sub.ij)
[0105] Logarithm of the Likelihood
p(z.sub.l|θ.sub.ij)
can be written as the negative of Itakura Saito distortion between noisy spectrum at the left ear
P.sub.zi(ω)
and modelled noisy spectrum
{circumflex over (P)}.sub.z.sup.ij(ω)
[0106] Using the same result for the right ear
p(z.sub.l,z.sub.r|θ.sub.ij)
can be written as eq (33) and (34):
[0110] Experimental Results:
[0111] This section explains the short term objective intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ) results obtained. Estimated short term predictor (STP) parameters may be used for enhancement on binaural noisy signals. Noisy signals are generated by first convolving the clean speech with impulse responses generated and subsequently summing up with binaural babble noise.
Kalman Filtering
[0112] Kalman filtering, also known as linear quadratic estimation (LQE), is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more precise than those based on a single measurement alone.
[0113] The Kalman filter may be applied in time series analysis used in fields such as signal processing.
[0114] The Kalman filter algorithm works in a two-step process. In the prediction step, the Kalman filter produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some amount of error, including random noise) is observed, these estimates are updated using a weighted average, with more weight being given to estimates with higher certainty. The algorithm is recursive. It can run in real time, using only the present input measurements and the previously calculated state and its uncertainty matrix; no additional past information is required.
[0115] The Kalman filter may not require any assumption that the errors are Gaussian. However, the Kalman filter may yield the exact conditional probability estimate in the special case that all errors are Gaussian-distributed.
[0116] Extensions and generalizations to the Kalman filtering method may be provided, such as the extended Kalman filter and the unscented Kalman filter which work on nonlinear systems. The underlying model may be a Bayesian model similar to a hidden Markov model but where the state space of the latent variables is continuous and where all latent and observed variables may have Gaussian distributions.
[0117] The Kalman filter uses a system's dynamics model, known control inputs to that system, and multiple sequential measurements to form an estimate of the system's varying quantities (its state) that is better than the estimate obtained by using any one measurement alone.
[0118] In general all measurements and calculations based on models are estimated to some degree. Noisy data, and/or approximations in the equations that describe how a system changes, and/or external factors that are not accounted for introduce some uncertainty about the inferred values for a system's state. The Kalman filter may average a prediction of a system's state with a new measurement using a weighted average. The purpose of the weights is that values with better (i.e., smaller) estimated uncertainty are “trusted” more. The weights may be calculated from the covariance, a measure of the estimated uncertainty of the prediction of the system's state. The result of the weighted average may be a new state estimate that may lie between the predicted and measured state, and may have a better estimated uncertainty than either alone. This process may be repeated every time step, with the new estimate and its covariance informing the prediction used in the following iteration. This means that the Kalman filter may work recursively and may require only the last “best guess”, rather than the entire history, of a system's state to calculate a new state.
[0119] Because the certainty of the measurements may be difficult to measure precisely, the filter's behavior may be determined in terms of gain. The Kalman gain may be a function of the relative certainty of the measurements and current state estimate, and can be “tuned” to achieve particular performance. With a high gain, the filter may place more weight on the measurements, and thus may follow them more closely. With a low gain, the filter may follow the model predictions more closely, smoothing out noise but may decrease the responsiveness. At the extremes, a gain of one may cause the filter to ignore the state estimate entirely, while a gain of zero may cause the measurements to be ignored.
[0120] When performing the actual calculations for the filter, the state estimate and covariances may be coded into matrices to handle the multiple dimensions involved in a single set of calculations. This allows for a representation of linear relationships between different state variables in any of the transition models or covariances.
[0121] The Kalman filters may be based on linear dynamic systems discretized in the time domain. They may be modelled on a Markov chain built on linear operators perturbed by errors that may include Gaussian noise. The state of the system may be represented as a vector of real numbers. At each discrete time increment, a linear operator may be applied to the state to generate the new state, with some noise mixed in, and optionally some information from the controls on the system if they are known. Then, another linear operator mixed with more noise may generate the observed outputs from the true (“hidden”) state.
[0122] In order to use the Kalman filter to estimate the internal state of a process given only a sequence of noisy observations, one may model the process in accordance with the framework of the Kalman filter. This means specifying the following matrices: F.sub.k, the state-transition model; H.sub.k, the observation model; Q.sub.k, the covariance of the process noise; R.sub.k, the covariance of the observation noise; and sometimes B.sub.k, the control-input model, for each time-step, k, as described below.
[0123] The Kalman filter model may assume the true state at time k is evolved from the state at (k−1) according to
x.sub.k=F.sub.kx.sub.k-1+B.sub.ku.sub.k+w.sub.k
where [0124] F.sub.k is the state transition model which is applied to the previous state x.sub.k-1; [0125] B.sub.k is the control-input model which is applied to the control vector u.sub.k; [0126] w.sub.k is the process noise which is assumed to be drawn from a zero mean multivariate normal distribution with covariance Q.sub.k.
w.sub.k˜(0,Q.sub.k)
[0127] At time k an observation (or measurement) z.sub.k of the true state x.sub.k is made according to
z.sub.k=H.sub.kx.sub.k+v.sub.k
where H.sub.k is the observation model which maps the true state space into the observed space and v.sub.k is the observation noise which is assumed to be zero mean Gaussian white noise with covariance R.sub.k.
v.sub.k˜(0,R.sub.k)
[0128] The initial state, and the noise vectors at each step {x.sub.0, w.sub.1, . . . , w.sub.k, v.sub.1 . . . v.sub.k} may all assumed to be mutually independent.
[0129] The Kalman filter may be a recursive estimator. This means that only the estimated state from the previous time step and the current measurement may be needed to compute the estimate for the current state. In contrast to batch estimation techniques, no history of observations and/or estimates may be required. In what follows, the notation {circumflex over (x)}.sub.n|m represents the estimate of x at time n given observations up to, and including at time m≦n.
[0130] The state of the filter is represented by two variables: [0131] {circumflex over (x)}.sub.k|k, the a posteriori state estimate at time k given observations up to and including at time k; [0132] P.sub.k|k, the a posteriori error covariance matrix (a measure of the estimated accuracy of the state estimate).
[0133] The Kalman filter can be written as a single equation, however it may be conceptualized as two distinct phases: “Predict” and “Update”. The predict phase may use the state estimate from the previous timestep to produce an estimate of the state at the current timestep. This predicted state estimate is also known as the a priori state estimate because, although it is an estimate of the state at the current timestep, it may not include observation information from the current timestep. In the update phase, the current a priori prediction may be combined with current observation information to refine the state estimate. This improved estimate is termed the a posteriori state estimate.
[0134] Typically, the two phases alternate, with the prediction advancing the state until the next scheduled observation, and the update incorporating the observation. However, this may not be necessary; if an observation is unavailable for some reason, the update may be skipped and multiple prediction steps may be performed. Likewise, if multiple independent observations are available at the same time, multiple update steps may be performed (typically with different observation matrices H.sub.k).
Predict:
[0135] Predicted (a prion) state estimate {circumflex over (x)}.sub.k|k-1=F.sub.k{circumflex over (x)}.sub.k-1|k-1+B.sub.ku.sub.k
Predicted (a prion) estimate covariance P.sub.k|k-1=F.sub.kP.sub.k-1|k-1F.sub.k.sup.T+Q.sub.k
Update:
[0136] Innovation or measurement residual ŷ.sub.k=z.sub.k−H.sub.k{circumflex over (x)}.sub.k|k-1
Innovation (or residual) covariance S.sub.k=H.sub.kP.sub.k|k-1H.sub.k.sup.T+R.sub.k
Optimal Kalman gain K.sub.k=P.sub.k|k-1H.sub.k.sup.TS.sub.k.sup.−1
Updated (a posteriori) state estimate {circumflex over (x)}.sub.k|k={circumflex over (x)}.sub.k|k-1+K.sub.k{tilde over (y)}.sub.k
Updated (a posteriori) estimate covariance P.sub.k|k=(I−K.sub.kH.sub.k)P.sub.k|k-1
[0137] The formula for the updated estimate covariance above may only be valid for the optimal Kalman gain. Usage of other gain values may require a more complex formula.
Invariants:
[0138] If the model is accurate, and the values for {circumflex over (x)}.sub.0|0 and P.sub.0|0 accurately reflect the distribution of the initial state values, then the following invariants may be preserved (all estimates have a mean error of zero):
E[x.sub.k−{circumflex over (x)}.sub.k|k]=E[x.sub.k−{circumflex over (x)}.sub.k|k-1]=0
E[
where E[ξ] is the expected value of ξ, and covariance matrices may accurately reflect the covariance of estimates:
P.sub.k|k=cov(x.sub.k−{circumflex over (x)}.sub.k|k)
P.sub.k|k-1=cov(x.sub.k−{circumflex over (x)}.sub.k|k-1)
S.sub.k=cov({tilde over (y)}.sub.k)
Optimality and Performance:
[0139] It follows from theory that the Kalman filter is optimal in cases where a) the model perfectly matches the real system, b) the entering noise is white and c) the covariances of the noise are exactly known. After the covariances are estimated, it may be useful to evaluate the performance of the filter, i.e. whether it is possible to improve the state estimation quality. If the Kalman filter works optimally, the innovation sequence (the output prediction error) may be a white noise, therefore the whiteness property of the innovations may measure filter performance. Different methods can be used for this purpose.
Deriving the a Posteriori Estimate Covariance Matrix:
[0140] Starting with the invariant on the error covariance P.sub.k|k as above
P.sub.k|k=cov(x.sub.k−{tilde over (x)}.sub.k|k)
substitute in the definition of {circumflex over (x)}.sub.k|k
P.sub.k|k=cov(x.sub.k−({circumflex over (x)}.sub.k|k-1+K.sub.k{tilde over (y)}.sub.k))
and substitute {tilde over (y)}.sub.k
P.sub.k|k=cov(x.sub.k−({circumflex over (x)}.sub.k|k-1+K.sub.k(z.sub.k−H.sub.k{circumflex over (x)}.sub.k|k-1)))
and z.sub.k
P.sub.k|k=cov(x.sub.k−({circumflex over (k)}.sub.k|k-1+K.sub.k(H.sub.kx.sub.k+v.sub.k−H.sub.k{circumflex over (x)}.sub.k|k-1)))
and collecting the error vectors:
P.sub.k|k=cov((I−K.sub.kH.sub.k)(x.sub.k−{circumflex over (x)}.sub.k|k-1)−K.sub.kv.sub.k)
[0141] Since the measurement error v.sub.k is uncorrelated with the other terms, this becomes
P.sub.k|k=cov((I−K.sub.kH.sub.k)(x.sub.k−{circumflex over (x)}.sub.k|k-1))+cov(K.sub.kv.sub.k)
by the properties of vector covariance this becomes
P.sub.k|k=(I−K.sub.kH.sub.k)cov(x.sub.k−{circumflex over (x)}.sub.k|k-1)(I−K.sub.kH.sub.k).sup.T+K.sub.kcov(v.sub.k)K.sub.k.sup.T
which, using the invariant on P.sub.k|k-1 and the definition of R.sub.k becomes
P.sub.k|k=(I−K.sub.kH.sub.k)P.sub.k|k-1(I−K.sub.kH.sub.k).sup.T+K.sub.kR.sub.kK.sub.k.sup.T
[0142] This formula may be valid for any value of K.sub.k. It turns out that if K.sub.k is the optimal Kalman gain, this can be simplified further as shown below.
Kalman Gain Derivation:
[0143] The Kalman filter may be a minimum mean-square error (MMSE) estimator. The error in the a posteriori state estimation may be
x.sub.k−x.sub.k|k
[0144] When seeking to minimize the expected value of the square of the magnitude of this vector, E[∥x.sub.k−{circumflex over (x)}.sub.k|k∥.sup.2]. This is equivalent to minimizing the trace of the a posterior estimate covariance matrix P.sub.k|k. By expanding out the terms in the equation above and collecting, we get:
[0145] The trace may be minimized when its matrix derivative with respect to the gain matrix is zero. Using the gradient matrix rules and the symmetry of the matrices involved we find that
[0146] Solving this for K.sub.k yields the Kalman gain:
K.sub.kS.sub.k=(H.sub.kP.sub.k|k-1).sup.T=P.sub.k|k-1H.sub.k.sup.T
K.sub.k=P.sub.k|k-1H.sub.k.sup.TS.sub.k.sup.−1
[0147] This gain, which is known as the optimal Kalman gain, is the one that may yield MMSE estimates when used.
Simplification of the a posteriori error covariance formula:
[0148] The formula used to calculate the a posteriori error covariance can be simplified when the Kalman gain equals the optimal value derived above. Multiplying both sides of our Kalman gain formula on the right by S.sub.kK.sub.k.sup.T, it follows that
K.sub.kS.sub.kK.sub.k.sup.T=P.sub.k|kH.sub.k.sup.TK.sub.k.sup.T
[0149] Referring back to our expanded formula for the a posteriori error covariance,
P.sub.k|k=P.sub.k|k-1K.sub.kH.sub.kP.sub.k|k-1−P.sub.k|k-1H.sub.k.sup.TK.sub.k.sup.T+K.sub.kS.sub.kK.sub.k.sup.T
we find the last two terms cancel out, giving
P.sub.k|k=P.sub.k|k-1−K.sub.kH.sub.kP.sub.k|k-1=(I−K.sub.kH.sub.k)P.sub.k|k-1.
[0150] This formula is computationally cheaper and thus nearly always used in practice, but may only be correct for the optimal gain. If arithmetic precision is unusually low causing problems with numerical stability, or if a non-optimal Kalman gain is deliberately used, this simplification may not be applied; instead the a posteriori error covariance formula as derived above may be used.
Fixed-Lag Smoother:
[0151] The optimal fixed-lag smoother may provide the optimal estimate of {tilde over (x)}.sub.k-N|k for a given fixed-lag N using the measurements from z.sub.1 to z.sub.k. It can be derived using the previous theory via an augmented state, and the main equation of the filter may be the following:
where:
[0152] {circumflex over (x)}.sub.t|t-1 is estimated via a standard Kalman filter;
[0153] y.sub.t|t-1=z.sub.t−H{circumflex over (x)}.sub.t|t-1 is the innovation produced considering the estimate of the standard Kalman filter;
[0154] the various {circumflex over (x)}.sub.t-1|t with i=1, . . . , N−1 are new variables, i.e. they do not appear in the standard Kalman filter;
[0155] the gains are computed via the following scheme:
K.sup.(i)=P.sup.(i)H.sup.T[HPH.sup.T+R].sup.−1
and
P=P[[F−KH].sup.T].sup.i
where P and K are the prediction error covariance and the gains of the standard Kalman filter (i.e., P.sub.t|t-1).
[0156] If the estimation error covariance is defined so that
P.sub.i:=E[(x.sub.t-i−{circumflex over (x)}.sub.t-i|t)*(x.sub.t-i−{circumflex over (x)}.sub.t-i|t)|z.sub.1 . . . z.sub.t],
then we have that the improvement on the estimation of x.sub.t-i iss given by:
[0157] Although particular features have been shown and described, it will be understood that they are not intended to limit the claimed invention, and it will be made obvious to those skilled in the art that various changes and modifications may be made without departing from the scope of the claimed invention. The specification and drawings are, accordingly to be regarded in an illustrative rather than restrictive sense. The claimed invention is intended to cover all alternatives, modifications and equivalents.
LIST OF REFERENCES
[0158] 2 hearing device [0159] 4 input transducer [0160] 6 processing unit [0161] 8 output transducer [0162] 10 hearing device user [0163] 12 left ear input signal zl(n) or noisy signal at the left ear [0164] 14 right ear input signal zr(n) or noisy signal at the right ear [0165] 16 noise codebook [0166] 18 speech codebook [0167] 20 distance vector for the left ear consisting of Itakura Saito distances between the noisy spectrum at the left ear and modeled noisy spectrum [0168] 22 distance vector for the right ear consisting of Itakura Saito distances between the noisy spectrum at the right ear and modeled noisy spectrum [0169] 24 combined weights of the left and right ear [0170] 26 modeled noisy spectrum (sum of 16 and 18) left ear [0171] 28 modeled noisy spectrum (sum of 16 and 18) right ear [0172] 30 spectral envelope left ear [0173] 32 spectral envelope right ear [0174] 34 Itakura Saito distortion for left ear [0175] 36 Itakura Saito distortion for right ear [0176] 38 noisy spectrum left ear [0177] 40 noisy spectrum right ear [0178] 101 providing an input signal z(n) comprising a speech signal and a noise signal [0179] 102 performing a codebook based approach processing on the input signal z(n) [0180] 103 determining one or more parameters of the input signal z(n) based on the codebook based approach processing in step 102 [0181] 104 performing a Kalman filtering of the input signal z(n) using the determined one or more parameters from step 103 [0182] 105 providing that an output signal is speech intelligibility enhanced due to the Kalman filtering in step 104