Encoder and method for encoding an audio signal with reduced background noise using linear predictive coding
10692510 · 2020-06-23
Assignee
Inventors
Cpc classification
G10L21/0308
PHYSICS
G10L19/06
PHYSICS
G10L19/265
PHYSICS
G10L19/005
PHYSICS
G10L19/125
PHYSICS
G10L19/08
PHYSICS
International classification
G10L19/08
PHYSICS
G10L21/0308
PHYSICS
G10L21/02
PHYSICS
G10L19/005
PHYSICS
G10L19/125
PHYSICS
G10L19/06
PHYSICS
Abstract
It is shown an encoder for encoding an audio signal with reduced background noise using linear predictive coding. The encoder includes a background noise estimator configured to estimate background noise of the audio signal, a background noise reducer configured to generate background noise reduced audio signal by subtracting the estimated background noise of the audio signal from the audio signal, and a predictor configured to subject the audio signal to linear prediction analysis to obtain a first set of linear prediction filter (LPC) coefficients and to subject the background noise reduced audio signal to linear prediction analysis to obtain a second set of linear prediction filter (LPC) coefficients. Furthermore, the encoder includes an analysis filter composed of a cascade of time-domain filters controlled by the obtained first set of LPC coefficients and the obtained second set of LPC coefficients.
Claims
1. Encoder for encoding an audio signal with reduced background noise using linear predictive coding, the encoder comprising: a background noise estimator configured to estimate a representation of background noise of the audio signal; a background noise reducer configured to generate a representation of a background noise reduced audio signal by subtracting the estimated representation of the background noise of the audio signal from a representation of the audio signal; a predictor configured to subject the representation of the audio signal to linear prediction analysis to acquire a first set of linear prediction filter (LPC) coefficients and to subject the representation of the background noise reduced audio signal to linear prediction analysis to acquire a second set of linear prediction filter (LPC) coefficients; and an analysis filter composed of a cascade of time-domain filters controlled by the acquired first set of LPC coefficients and the acquired second set of LPC coefficients to acquire a residual signal from the audio signal; wherein the encoder is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
2. Encoder according to claim 1, wherein the cascade of time domain filters comprises a first linear prediction filter and a second linear prediction filter which use the acquired first set of LPC coefficients followed by an inverse of a third linear prediction filter which uses the acquired second set of LPC coefficients.
3. Encoder according to claim 1, wherein the cascade of time-domain filters is a Wiener filter.
4. Encoder according to claim 1, wherein the background noise estimator is configured to estimate an autocorrelation of the background noise as the estimated representation of the background noise; wherein the background noise reducer is configured to generate the representation of the background noise reduced audio signal by subtracting the autocorrelation of the background noise from an autocorrelation of the audio signal, wherein the autocorrelation of the audio signal is the representation of the audio signal and wherein the representation of the background noise reduced audio signal is an autocorrelation of a background noise reduced audio signal.
5. Encoder according to claim 1, wherein the estimated representation of the background noise of the audio signal is an autocorrelation of the background noise of the audio signal and the representation of the audio signal is an autocorrelation of the audio signal.
6. Encoder according to claim 1, further comprising a transmitter configured to transmit the second set of LPC coefficients.
7. Encoder according to claim 1, further comprising a transmitter configured to transmit the residual signal.
8. Encoder according to claim 1, further comprising a quantizer configured to quantize and/or encode the residual signal before transmission.
9. Encoder according to claim 8, wherein the quantizer is configured to use code-excited linear prediction (CELP), entropy coding, or transform coded excitation (TCX).
10. Encoder according to claim 1, further comprising a quantizer configured to quantize and/or encode the second set of LPC coefficients before transmission.
11. System comprising: an encoder for encoding an audio signal with reduced background noise using linear predictive coding, said encoder comprising: a background noise estimator configured to estimate a representation of background noise of the audio signal; a background noise reducer configured to generate a representation of a background noise reduced audio signal by subtracting the estimated representation of the background noise of the audio signal from a representation of the audio signal; a predictor configured to subject the representation of the audio signal to linear prediction analysis to acquire a first set of linear prediction filter (LPC) coefficients and to subject the representation of the background noise reduced audio signal to linear prediction analysis to acquire a second set of linear prediction filter (LPC) coefficients; and an analysis filter composed of a cascade of time-domain filters controlled by the acquired first set of LPC coefficients and the acquired second set of LPC coefficients to acquire a residual signal from the audio signal; a decoder configured to decode the encoded audio signal, wherein each of the encoder and the decoder is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
12. Method for encoding an audio signal with reduced background noise using linear predictive coding, the method comprising: estimating a representation of background noise of the audio signal; generating a representation of a background noise reduced audio signal by subtracting the estimated representation of the background noise of the audio signal from a representation of the audio signal; subjecting the representation of the audio signal to linear prediction analysis to acquire a first set of linear prediction filter (LPC) coefficients and subjecting the representation of the background noise reduced audio signal to linear prediction analysis to acquire a second set of linear prediction filter (LPC) coefficients; and controlling a cascade of time domain filters by the acquired first set of LPC coefficients and the acquired second set of LPC coefficients to acquire a residual signal from the audio signal.
13. Non-transitory digital storage medium having a computer program stored thereon to perform a method for encoding an audio signal with reduced background noise using linear predictive coding, said method comprising: estimating a representation of background noise of the audio signal; generating a representation of a background noise reduced audio signal by subtracting the estimated representation of the background noise of the audio signal from a representation of the audio signal; subjecting the representation of the audio signal to linear prediction analysis to acquire a first set of linear prediction filter (LPC) coefficients and subjecting the representation of the background noise reduced audio signal to linear prediction analysis to acquire a second set of linear prediction filter (LPC) coefficients; and controlling a cascade of time domain filters by the acquired first set of LPC coefficients and the acquired second set of LPC coefficients to acquire a residual signal from the audio signal, when said computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
DETAILED DESCRIPTION OF THE INVENTION
(15) In the following, embodiments of the invention will be described in further detail. Elements shown in the respective figures having the same or a similar functionality with have associated therewith the same reference signs.
(16) Following will describe a method for joint enhancement and coding, based on Wiener filtering [12] and CELP coding. The advantages of this fusion are that 1) inclusion of Wiener filtering in the processing chain does not increase the low algorithmic delay of the CELP codec, and that 2) the joint optimization simultaneously minimizes distortion due to quantization and background noise. Moreover, the computational complexity of the joint scheme is lower than the one of the cascaded approach. The implementation relies on recent work on residual-windowing in CELP-style codecs [13, 14, 15], which allows to incorporate the Wiener filtering into the filters of the CELP codec in a new way. With this approach it can demonstrated that both the objective and subjective quality is improved in comparison to a cascaded system.
(17) The proposed method for joint enhancement and coding of speech, thereby avoids accumulation of errors due to cascaded processing and further improving perceptual output quality. In other words, the proposed method avoids accumulation of errors due to cascaded processing, as a joint minimization of interference and quantization distortion is realized by an optimal Wiener filtering in a perceptual domain.
(18)
(19) Furthermore, the encoder 4 may comprise a predictor 18 configured to subject the representation of the audio signal 8 to linear prediction analysis to obtain a first set of linear prediction filter (LPC) coefficients 20a and to subject the representation of the background noise reduced audio signal 16 to linear prediction analysis to obtain a second set of linear prediction filter coefficients 20b. Similar to the background noise reducer 14, the predictor 18 may comprise a generator to internally generate the representation of the audio signal 8 from the audio signal 8. However, it may be advantageous to use a common or central generator 17 to calculate the representation 8 of the audio signal 8 once and to provide the representation of the audio signal, such as the autocorrelation of the audio signal 8, to the background noise reducer 14 and the predictor 18. Thus, the predictor may receive the representation of the audio signal 8 and the representation of the background noise reduced audio signal 16, for example the autocorrelation of the audio signal and the autocorrelation of the background noise reduced audio signal, respectively, and to determine, based on the inbound signals, the first set of LPC coefficients and the second set of LPC coefficients, respectively.
(20) In other words, the first set of LPC coefficients may be determined from the representation of the audio signal 8 and the second set of LPC coefficients may be determined from the representation of the background noise reduced audio signal 16. The predictor may perform the Levinson-Durbin algorithm to calculate the first and the second set of LPC coefficients from the respective autocorrelation.
(21) Furthermore, the encoder comprises an analysis filter 22 composed of a cascade 24 of time domain filters 24a, 24b controlled by the obtained first set of LPC coefficients 20a and the obtained second set of LPC coefficients 20b. The analysis filter may apply the cascade of time domain filters, wherein filter coefficients of the first time domain filter 24a are the first set of LPC coefficients and filter coefficients of the second time domain filter 24b are the second set of LPC coefficients, to the audio signal 8 to determine a residual signal 26. The residual signal may comprise the signal components of the audio signal 8 which may not be represented by a linear filter having the first and/or the second set of LPC coefficients.
(22) According to embodiments, the residual signal may be provided to a quantizer 28 configured to quantize and/or encode the residual signal and/or the second set of LPC coefficients 24b before transmission. The quantizer may for example perform transform coded excitation (TCX), code excited linear prediction (CELP), or a lossless encoding such as for example entropy coding.
(23) According to a further embodiment, the encoding of the residual signal may be performed in a transmitter 30 as an alternative to the encoding in the quantizer 28. Thus, the transmitter for example performs transform coded excitation (TCX), code excited linear prediction (CELP), or a lossless encoding such as for example entropy coding to encode the residual signal. Furthermore, the transmitter may be configured to transmit the second set of LPC coefficients. An optional receiver is the decoder 6. Therefore, the transmitter 30 may receive the residual signal 26 or the quantized residual signal 26. According to an embodiment, the transmitter may encode the residual signal or the quantized residual signal, at least if the quantized residual signal is not already encoded in the quantizer. After optional encoding the residual signal or alternatively the quantized residual signal, the respective signal provided to the transmitter is transmitted as an encoded residual signal 32 or as an encoded and quantized residual signal 32. Furthermore, the transmitter may receive the second set of LPC coefficients 20b, optionally encode the same, for example with the same encoding method as used to encode the residual signal, and further transmit the encoded second set of LPC coefficients 20b, for example to the decoder 6, without transmitting the first set of LPC coefficients. In other words, the first set of LPC coefficients 20a does not need to be transmitted.
(24) The decoder 6 may further receive the encoded residual signal 32 or alternatively the encoded quantized residual signal 32 and additionally to one of the residual signals 32 or 32 the encoded second set of LPC coefficients 20b. The decoder may decode the single received signals and provide the decoded residual signal 26 to a synthesis filter. The synthesis filter may be the inverse of a linear predictive FIR (finite impulse response) filter having the second set of LPC coefficients as filter coefficients. In other words, a filter having the second set of LPC coefficients is inverted to form the synthesis filter of the decoder 6. Output of the synthesis filter and therefore output of the decoder is the decoded audio signal 8.
(25) According to embodiments, the background noise estimator may estimate an autocorrelation 12 of the background noise of the audio signal as a representation of the background noise of the audio signal. Furthermore, the background noise reducer may generate the representation of the background noise reduced audio signal 16 by subtracting the autocorrelation of the background noise 12 from an autocorrelation of the audio signal 8, wherein the estimated autocorrelation 8 of the audio signal is the representation of the audio signal and wherein the representation of the background noise reduced audio signal 16 is an autocorrelation of the background noise reduced audio signal.
(26)
(27) Both
(28)
(29)
(30) As already described with respect to
(31) In other words with respect to
(32) Let s.sub.k=[s.sub.k, s.sub.k1, . . . , s.sub.kM].sup.T be a vector of the input signal where the superscript .sup.T denotes the transpose. The residual can then be expressed as
r.sub.k=a.sup.Ts.sub.k. (1)
(33) Given the autocorrelation matrix R.sub.ss of the speech signal vector s.sub.k
R.sub.ss=E{s.sub.ks.sub.k.sup.T}, (2)
an estimate of the prediction filter of order M can be given as [20]
a=.sub.e.sup.2R.sub.ss.sup.1u, (3)
where u=[1, 0, 0, . . . , 0].sup.T and the scalar prediction error .sub.e.sup.2 is chosen such that .sub.0=1. Observe that the linear predictive filter .sub.n, is a whitening filter, whereby r.sub.k is uncorrelated white noise. Moreover, the original signal s.sub.n can be reconstructed from the residual r.sub.n through IIR filtering with the predictor .sub.n. The next step is to quantize vectors of the residual r.sub.k=[r.sub.kN, r.sub.kN1, . . . , r.sub.kNN+1].sup.T with a vector quantizer to {tilde over (r)}.sub.k, such that perceptual distortion is minimized. Let a vector of the output signal be s.sub.k=[s.sub.kN, s.sub.kN1, . . . , s.sub.kN+1].sup.T and {tilde over (s)}.sub.k its quantized counterpart, and W a convolution matrix which applies perceptual weighting on the output. The perceptual optimization problem can then be written as
(34)
where H is a convolution matrix corresponding to the impulse response of the predictor .sub.n.
(35) The process of CELP type speech coding is depicted in
(36) Wiener Filtering
(37) In single channel speech enhancement, it is assume that the signal y.sub.n is acquired, which is an additive mixture of the desired clean speech signal s.sub.n and some undesired interference v.sub.n, that is
y.sub.n=s.sub.n+v.sub.n. (5)
(38) The goal of the enhancement process is to estimate the clean speech signal s.sub.n, while accessible is only to the noisy signal y.sub.n and estimates of the correlation matrices
R.sub.ss=E{s.sub.ks.sub.k.sup.T} and R.sub.yy=E{y.sub.ky.sub.k.sup.T}(6)
(39) Where y.sub.k=[y.sub.k, y.sub.k1, . . . , y.sub.kM].sup.T. Using a filter matrix H, the estimate of the clean speech signal .sub.n is defined as
.sub.k=Hy.sub.k. (7)
(40) The optimal filter in the minimum mean square error (MMSE) sense, known as the Wiener filter can be readily derived as [12]
H=R.sub.ssR.sub.yy.sup.1. (8)
(41) Usually, Wiener filtering is applied onto overlapping windows of the input signal and reconstructed using the overlap-add method [21, 12]. This approach is illustrated in Enhancement-block of
(42) To obtain such a connection, the estimated speech signal .sub.k is substituted into Eq. 1, whereby
r.sub.k=a.sup.T.sub.k=a.sup.THy.sub.k=.sub.e.sup.2u.sup.TR.sub.ss.sup.1R.sub.ssR.sub.yy.sup.1y.sub.k=.sub.e.sup.2u.sup.TR.sub.yy.sup.1y.sub.k=a.sup.Ty.sub.k (9)
where is a scaling coefficient and
a={circumflex over ()}.sub.e.sup.2R.sub.yy.sup.1u(10)
is the optimal predictor for the noisy signal y.sub.n. In other words, by filtering the noisy signal with a the (scaled) residual of the estimated clean signal is obtained. The scaling is ratio between the ratio between the expected residual errors of the clean and noisy signals, .sub.e.sup.2 and {circumflex over ()}.sub.e.sup.2, respectively, that is, =.sub.e.sup.2/{circumflex over ()}.sub.e.sup.2. This derivation thus shows that Wiener filtering and linear prediction are intimately related methods and in the following section, this connection will be used to develop a joint enhancement and coding method.
Incorporating Wiener Filtering into a CELP Codec
(43) An objective is to merge Wiener filtering and a CELP codecs (described in section 3 and section 2) into a joint algorithm. By merging these algorithms the delay of overlap-add windowing which may be used by usual implementations of Wiener filtering can be avoided, and reduces the computational complexity.
(44) Implementation of the joint structure is then straightforward. It is shown that the residual of the enhanced speech signal can be obtained by Eq. 9. The enhanced speech signal can therefore be reconstructed by IIR filtering the residual with the linear predictive model .sub.n of the clean signal.
(45) For quantization of the residual, Eq. 4 can be modified by replacing the clean signal s.sub.k with the estimated signal {tilde over (s)}.sub.k to obtain
(46)
(47) In other words, the objective function with the enhanced target signal {tilde over (s)}.sub.k remains the same as if having access to the clean input signal s.sub.k.
(48) In conclusion, the only modification to standard CELP is to replace the analysis filter a of the clean signal with that of the noisy signal a. The remaining parts of the CELP algorithm remains unchanged. The proposed approach is illustrated in
(49) It is clear that the proposed method can be applied in any CELP codecs with minimal changes whenever noise attenuation is desired and when having access to an estimate of the autocorrelation of the clean speech signal R.sub.ss. If an estimate of the clean speech signal autocorrelation is not available, it can be estimated using an estimate of the autocorrelation of the noise signal R.sub.vv, by R.sub.ssR.sub.yyR.sub.vv or other common estimates.
(50) The method can be readily extended to scenarios such as multi-channel algorithms with beamforming, as long as an estimate of the clean signal is obtainable using time-domain filters.
(51) The advantage in computational complexity of the proposed method can be characterized as follows. Note that in the conventional approach it is needed to determine the matrix-filter H, given by Eq. 8. The matrix inversion which may be used is of complexity (M.sup.3). However, in the proposed approach only Eq. 3 is to be solved for the noisy signal, which can be implemented with the Levinson-Durbin algorithm (or similar) with complexity
(N.sup.2).
(52) Code Excited Linear Prediction
(53) In other words with respect to
(54) The linear predictive filter a.sub.s for one frame of the input signal s can be obtained, minimizing
(55)
where u=[1 0 0 . . . 0].sup.T. The solution follows as:
a.sub.s=.sub.e.sup.2R.sub.ss.sup.1u. (13)
(56) With the definition of the convolution matrix A.sub.s, consisting of the filter coefficients of a.sub.s
(57)
the residual signal can be obtained by multiplying the input speech frame with the convolution matrix A.sub.s
e.sub.s=A.sub.s.Math.s. (15)
(58) Windowing is here performed as in CELP-codecs by subtracting the zero-input response from the input signal and reintroducing it in the resynthesis [15].
(59) The multiplication in Equation 15 is identical to the convolution of the input signal with the prediction filter, and therefore corresponds to FIR filtering. The original signal can be reconstructed from the residual, by a multiplication with the reconstruction filter H.sub.s
s=H.sub.s.Math.e.sub.s. (16)
where H.sub.s, consists of the impulse response =[1, .sub.1 . . . .sub.N1] of the prediction filter
(60)
such that this operation corresponds to IIR filtering.
(61) The residual vector is quantized applying vector quantization. Therefore, the quantized vector .sub.s is chosen, minimizing the perceptual distance, in the norm2 sense, to the desired reconstructed clean signal:
(62)
where e.sub.s is the unquantized residual and W(z)=A (0.92z) is the perceptual weighting filter, as used in the AMR-WB speech codec [6].
Application of Wiener Filtering in a CELP Codec
(63) For the application of single-channel speech enhancement, assuming that the acquired microphone signal y.sub.n, is an additive mixture of the desired clean speech signal s.sub.n and some undesired interference v.sub.n, such that y.sub.n=s.sub.n+v.sub.n. In the Z-domain, equivalently Y(z)=S(z)+V(z).
(64) By applying a Wiener filter B(z) it is possible to reconstruct the speech signal S(z) from the noisy observation Y(z) by filtering, such that the estimated speech signal is (z):=B(z)Y(z)S(z). The minimum mean square solution for the Wiener filter follows as [12]
(65)
given the assumption that the speech and noise signals s.sub.n and v.sub.n, respectively, are uncorrelated.
(66) In a speech codec, an estimate of the power spectrum is available of the noisy signal y.sub.n, in the form of the impulse response of the linear predictive model |A.sub.y(z)|.sup.2. In other words, |S(z)|.sup.2+|V(z)|.sup.2|A.sub.y(z)|.sup.2 where is a scaling coefficient. The noisy linear predictor can be calculated from the autocorrelation matrix R.sub.yy of the noisy signal as usual.
(67) Furthermore, it may be estimated the power spectrum of the clean speech signal |S(z)|.sup.2 or equivalently, the autocorrelation matrix R.sub.ss of the clean speech signal. Enhancement algorithms often assume that the noise signal is stationary, whereby the autocorrelation of the noise signal as R.sub.vv can be estimated from a non-speech frame of the input signal. The autocorrelation matrix of the clean speech signal R.sub.ss can then be estimated as {circumflex over (R)}.sub.ss=R.sub.yyR.sub.vv. Here it is advantageous to make the usual precautions to ensure that {circumflex over (R)}.sub.ss remains positive definite.
(68) Using the estimated autocorrelation matrix for clean speech {circumflex over (R)}.sub.ss, the corresponding linear predictor can be determined, which impulse response in Z-domain is .sub.s.sup.1(z). Thus, |S(z)|.sup.2|.sub.s(z)|.sup.2 and Eq. 19 can be written as
(69)
(70) In other words, by filtering twice with the predictors of the noisy and clean signals, in FIR and IIR mode respectively, a Wiener estimate of the clean signal can be obtained.
(71) The convolution matrices may be denoted corresponding to FIR filtering with predictors .sub.s(z) and A.sub.y(z) by A.sub.s and A.sub.y, respectively. Similarly, let H.sub.s and H.sub.y be the respective convolution matrices corresponding to predictive filtering (IIR). Using these matrices, conventional CELP coding can be illustrated with a flow diagram as in
(72) The conventional approach to combining enhancement with coding is illustrated in
(73) Finally, in the proposed approach Wiener filtering is combined with CELP type speech codecs. Comparing the cascaded approach from
(74)
(75) Thus, this approach jointly minimizes the distance between the clean estimate and the quantized signal, whereby a joint minimization of the interference and the quantization noise in the perceptual domain is feasible.
(76) The performance of the joint speech coding and enhancement approach was evaluated using both objective and subjective measures. In order to isolate the performance of the new method, a simplified CELP codec is used, where only the residual signal was quantized, but the delay and gain of the long term prediction (LTP), the linear predictive coding (LPC) and the gain factors were not quantized. The residual was quantized using a pair-wise iterative method, where two pulses are added consecutively by trying them on every position, as described in [17]. Moreover, to avoid any influence of estimation algorithms, the correlation matrix of the clean speech signal R.sub.ss was assumed to be known in all simulated scenarios. With the assumption that the speech and the noise signal are uncorrelated, it holds that R.sub.ss=R.sub.yyR.sub.vv. In any practical application the noise correlation matrix R.sub.vv or alternatively the clean speech correlation matrix R.sub.ss has to be estimated from the acquired microphone signal. A common approach is to estimate the noise correlation matrix in speech brakes, assuming that the interference is stationary.
(77) The evaluated scenario consisted of a mixture of the desired clean speech signal and additive interference. Two types of interferences have been considered: stationary white noise and a segment of a recording of car noise from the Civilisation Soundscapes Library [18]. Vector quantization of the residual was performed with a bitrate of 2.8 kbit/s and 7.2 kbit/s, corresponding to an overall bitrate of 7.2 kbit/s and 13.2 kbit/s respectively for an AMR-WB codec [6]. A sampling-rate of 12.8 kHz was used for all simulations.
(78) The enhanced and coded signals were evaluated using both objective and subjective measures, therefore a listening test was conducted and a perceptual magnitude signal-to-noise ratio (SNR) was calculated, as defined in Equation 23 and Equation 22. This perceptual magnitude SNR was used as the joint enhancement process has no influence on the phase of the filters, as both the synthesis and the reconstruction filters are bound to the constraint of minimum phase filters, as per design of prediction filters.
(79) With the definition of the Fourier transform as operator (.Math.), the absolute spectral values of the reconstructed clean reference and the estimated clean signal in the perceptual domain follow as:
S=|(WH.sub.se.sub.k)| and =|
(WH.sub.s.sub.k)|. (22)
(80) The definition of the modified perceptual signal to noise ratio (PSNR) follows as:
(81)
(82) For the subjective evaluation, speech items were used from the test set used for the standardization of USAC [8], corrupted by white- and car-noise, as described above. It was conducted a Multiple Stimuli with Hidden Reference and Anchor (MUSHRA) [19] listening test with 14 participants, using STAX electrostatic headphones in a soundproof environment. The results of the listening test are illustrated in
(83) The absolute MUSHRA test results in
(84) To obtain a more detailed comparison of the joint and the pre-enhanced methods, the differential MUSHRA scores are presented in
(85) In other words, a method for joint speech enhancement and coding is shown, which allows minimization of overall interference and quantization noise. In contrast, conventional approaches apply enhancement and coding in cascaded processing steps. Joining both processing steps is also attractive in terms of computational complexity, since repeated windowing and filtering operations can be omitted.
(86) CELP type speech codecs are designed to offer a very low delay and therefore avoid an overlap of processing windows to future processing windows. In contrast, conventional enhancement methods, applied in the frequency domain rely on overlap-add windowing, which introduces an additional delay corresponding to the overlap length. The joint approach does not require overlap-add windowing, but uses the windowing scheme as applied in speech codecs [15], whereby avoiding the increase in algorithmic delay.
(87) A known issue with the proposed method is that, in difference to conventional spectral Wiener filtering where the signal phase is left intact, the proposed method applies time-domain filters, which do modify the phase. Such phase-modifications can be readily treated by application of suitable all-pass filters. However, since having not noticed any perceptual degradation attributed to phase-modifications, such all-pass filters were omitted to keep computational complexity low. Note, however, that in the objective evaluation, perceptual magnitude SNR was measured, to allow fair comparison of methods. This objective measure shows that the proposed method is on average three dB better than cascaded processing.
(88) The performance advantage of the proposed method was further confirmed by the results of a MUSHRA listening test, which show an average improvement of 6.4 points. These results demonstrate that application of joint enhancement and coding is beneficial for the overall system in terms of both quality and computational complexity, while maintaining the low algorithmic delay of CELP speech codecs.
(89)
(90) It is to be understood that in this specification, the signals on lines are sometimes named by the reference numerals for the lines or are sometimes indicated by the reference numerals themselves, which have been attributed to the lines. Therefore, the notation is such that a line having a certain signal is indicating the signal itself. A line can be a physical line in a hardwired implementation. In a computerized implementation, however, a physical line does not exist, but the signal represented by the line is transmitted from one calculation module to the other calculation module.
(91) Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
(92) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
(93) The inventive transmitted or encoded signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
(94) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(95) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(96) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
(97) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(98) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(99) A further embodiment of the inventive method is, therefore, a data carrier (or a non-transitory storage medium such as a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
(100) A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
(101) A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
(102) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(103) A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
(104) In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
(105) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.