Model Based Prediction in a Critically Sampled Filterbank

Abstract

The present document relates to audio source coding systems. In particular, the present document relates to audio source coding systems which make use of linear prediction in combination with a filterbank. A method for estimating a first sample (615) of a first subband signal in a first subband of an audio signal is described. The first subband signal of the audio signal is determined using an analysis filterbank (612) comprising a plurality of analysis filters which provide a plurality of subband signals in a plurality of subbands from the audio signal, respectively. The method comprises determining a model parameter (613) of a signal model; determining a prediction coefficient to be applied to a previous sample (614) of a first decoded subband signals derived from the first subband signal, based on the signal model, based on the model parameter (613) and based on the analysis filterbank (612); wherein a time slot of the previous sample (614) is prior to a time slot of the first sample (615); and determining an estimate of the first sample (615) by applying the prediction coefficient to the previous sample (614).

Claims

1. A method, performed by an audio signal processing device, for determining an estimate of a sample of a subband signal from two or more previous samples of the subband signal, wherein the subband signal corresponds to one of a plurality of subbands of a subband-domain representation of an audio signal, the method comprising determining signal model data comprising a model parameter; determining a first prediction coefficient to be applied to a first previous sample of the subband signal; wherein the first prediction coefficient is determined in response to the model parameter using a first lookup table and/or a first analytical function; determining a second prediction coefficient to be applied to a second previous sample of the subband signal; wherein a time slot of the second previous sample immediately precedes a time slot of the first previous sample; wherein the second prediction coefficient is determined in response to the model parameter using a second lookup table and/or a second analytical function; and determining the estimate of the sample by applying the first prediction coefficient to the first previous sample and by applying the second prediction coefficient to the second previous sample; wherein the method is implemented, at least in part, by one or more processors of the audio signal processing device.

2. An audio signal processing device configured to determine an estimate of a sample of a subband signal from two or more previous samples of the subband signal, wherein the subband signal corresponds to one of a plurality of subbands of a subband-domain representation of an audio signal; wherein the audio signal processing device comprises a predictor calculator configured to determine signal model data comprising a model parameter; determine a first prediction coefficient to be applied to a first previous sample of the subband signal; wherein the first prediction coefficient is determined in response to the model parameter using a first lookup table and/or a first analytical function; and determine a second prediction coefficient to be applied to a second previous sample of the subband signal; wherein a time slot of the second previous sample immediately precedes a time slot of the first previous sample; wherein the second prediction coefficient is determined in response to the model parameter using a second lookup table and/or a second analytical function; and a subband predictor configured to determine the estimate of the first sample by applying the first prediction coefficient to the first previous sample and by applying the second prediction coefficient to the second previous sample; wherein the first analytical function and the second analytical function are different, and one or more of the predictor calculator and the subband predictor are implemented, at least in part, by one or more processors of the audio signal processing device.

3. A non-transitory computer-readable storage medium comprising a sequence of instructions which, when executed by a computer, cause the computer to perform the method of claim 1.

Description

SHORT DESCRIPTION OF THE FIGURES

[0044] The present invention is described below by way of illustrative examples, not limiting the scope or spirit of the invention, with reference to the accompanying drawings, in which:

[0045] FIG. 1 depicts the block diagram of an example audio decoder applying linear prediction in a filterbank domain (i e in a subband domain);

[0046] FIG. 2 shows example prediction masks in a time frequency grid;

[0047] FIG. 3 illustrates example tabulated data for a sinusoidal model based predictor calculator;

[0048] FIG. 4 illustrates example noise shaping resulting from in-band subband prediction;

[0049] FIG. 5 illustrates example noise shaping resulting from cross-band subband prediction; and

[0050] FIG. 6a depicts an example two-dimensional quantization grid underlying the tabulated data for a periodic model based predictor calculation;

[0051] FIG. 6b illustrates the use of different prediction masks for different ranges of signal periodicities; and

[0052] FIGS. 7a and 7b show flow charts of example encoding and decoding methods using model based subband prediction.

DETAILED DESCRIPTION

[0053] The below-described embodiments are merely illustrative for the principles of the present invention for model based prediction in a critically sampled filterbank. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.

[0054] FIG. 1 depicts the block diagram of an example audio decoder 100 applying linear prediction in a filterbank domain (also referred to as subband domain). The audio decoder 100 receives a bit stream comprising information regarding a prediction error signal (also referred to as the residual signal) and possibly information regarding a description of a predictor used by a corresponding encoder to determine the prediction error signal from an original input audio signal. The information regarding the prediction error signal may relate to subbands of the input audio signal and the information regarding a description of the predictor may relate to one or more subband predictors.

[0055] Given the received bit stream information, the inverse quantizer 101 may output samples 111 of the prediction error subband signals. These samples may be added to the output 112 of the subband predictor 103 and the sum 113 may be passed to a subband buffer 104 which keeps a record of previously decoded samples 113 of the subbands of the decoded audio signal. The output of the subband predictor 103 may be referred to as the estimated subband signals 112. The decoded samples 113 of the subbands of the decoded audio signal may be submitted to a synthesis filterbank 102 which converts the subband samples to the time domain, thereby yielding time domain samples 114 of the decoded audio signal.

[0056] In other words, the decoder 100 may operate in the subband domain. In particular, the decoder 100 may determine a plurality of estimated subband signals 112 using the subband predictor 103. Furthermore, the decoder 100 may determine a plurality of residual subband signals 111 using the inverse quantizer 101. Respective pairs of the plurality of estimated subband signals 112 and the plurality of residual subband signals 111 may be added to yield a corresponding plurality of decoded subband signals 113. The plurality of decoded subband signals 113 may be submitted to a synthesis filterbank 102 to yield the time domain decoded audio signal 114.

[0057] In an embodiment of the subband predictor 103, a given sample of a given estimated subband signal 112 may be obtained by a linear combination of subband samples in the buffer 104 which corresponds to a different time and to a different frequency (i.e. different subband) than the given sample of the given estimated subband signal 112. In other words, a sample of an estimated subband signal 112 at a first time instant and in a first subband may be determined based on one or more samples of the decoded subband signals 113 which relate to a second time instant (different from the first time instant) and which relate to a second subband (different from the first subband). The collection of prediction coefficients and their attachment to a time and frequency mask may define the predictor 103, and this information may be furnished by the predictor calculator 105 of the decoder 100. The predictor calculator 105 outputs the information defining the predictor 103 by means of a conversion of signal model data included in the received bit stream. An additional gain may be transmitted which modifies the scaling of the output of the predictor 103. In an embodiment of the predictor calculator 105, the signal model data is provided in the form of an efficiently parametrized line spectrum, wherein each line in the parametrized line spectrum, or a group of subsequent lines of the parametrized line spectrum, is used to point to tabulated values of predictor coefficients. As such, the signal model data provided within the received bit stream may be used to identify entries within a pre-determined look-up table, wherein the entries from the look-up table provide one or more values for the predictor coefficients (also referred to as the prediction coefficients) to be used by the predictor 103. The method applied for the table look-up may depend on the trade-offs between complexity and memory requirements. For instance, a nearest neighbor type look-up may be used to achieve the lowest complexity, whereas an interpolating look-up method may provide similar performance with a smaller table size.

[0058] As indicated above, the received bit stream may comprise one or more explicitly transmitted gains (or explicitly transmitted indications of gains). The gains may be applied as part of or after the predictor operation. The one or more explicitly transmitted gains may be different for different subbands. The explicitly transmitted (indications of) additional gains are provided in addition to one or more model parameters which are used to determined the prediction coefficients of the predictor 103. As such, the additional gains may be used to scale the prediction coefficients of the predictor 103.

[0059] FIG. 2 shows example prediction mask supports in a time frequency grid. The prediction mask supports may be used for predictors 103 operating in a filterbank with a uniform time frequency resolution such as a cosine modulated filterbank (e.g. an MDCT filterbank). The notation is illustrated by diagram 201, in that a target darkly shaded subband sample 211 is the output of a prediction based on a lightly shaded subband sample 212. In the diagrams 202-205, the collection of lightly shaded subband samples indicates the predictor mask support. The combination of source subband samples 212 and target subband samples 211 will be referred to as a prediction mask 201. A time-frequency grid may be used to arrange subband samples in the vicinity of the target subband sample. The time slot index is increasing from left to right and the subband frequency index is increasing from bottom to top. FIG. 2 shows example cases of prediction masks and predictor mask supports and it should be noted that various other prediction masks and predictor mask supports may be used. The example prediction masks are: [0060] Prediction mask 202 defines in-band prediction of an estimated subband sample 221 at time instant k from two previous decoded subband samples 222 at time instants k−1 and k−2. [0061] Prediction mask 203 defines cross-band prediction of an estimated subband sample 231 at time instant k and in subband n based on three previous decoded subband samples 232 at time instant k−1 and in subbands n−1, n, n+1. [0062] Prediction mask 204 defines cross-band prediction of three estimated subband samples 241 at time instant k and in three different subbands n−1, n, n+1 based on three previous decoded subband samples 242 at time instant k−1 and in subbands n−1, n, n+1. The cross-band prediction may be performed such that each estimated subband sample 241 may be determined based on all of the three previous decoded subband samples 242 in the subbands n−1, n, n+1. [0063] Prediction mask 205 defines cross-band prediction of an estimated subband sample 251 at time instant k and in subband n based on twelve previous decoded subband samples 252 at time instants k−2, k−3, k−4, k−5 and in subbands n−1, n, n+1.

[0064] FIG. 3 illustrates tabulated data for a sinusoidal model based predictor calculator 105 operating in a cosine modulated filterbank. The prediction mask support is that of diagram 204. For a given frequency parameter, the subband with the nearest subband center frequency may be selected as central target subband. The difference between the frequency parameter and the center frequency of the central target subband may be computed in units of the frequency spacing of the filterbank (bins). This gives a value between −0.5 and 0.5 which may be rounded to the nearest available entry in the tabulated data, depicted by the abscissas of the nine graphs 301 of FIG. 3. This produces a 3×3 matrix of coefficients which is to be applied to the most recent values of the plurality of decoded subband signals 113 in the subband buffer 104 of the target subband and its two adjacent subbands. The resulting 3×1 vector constitutes the contribution of the subband predictor 103 to these three subbands for the given frequency parameter. The process may be repeated in an additive fashion for all the sinusoidal components in the signal model.

[0065] In other words, FIG. 3 illustrates an example of a model-based description of a subband predictor. It is assumed that the input audio signal comprises one or more sinusoidal components at fundamental frequencies Ω.sub.0, Ω.sub.1, . . . , Ω.sub.M-1. For each of the one or more sinusoidal components, a subband predictor using a pre-determined prediction mask (e.g. the prediction mask 204) may be determined. A fundamental frequency Ω of the input audio signal may lie within one of the subbands of the filterbank. This subband may be referred to as the central subband for this particular fundamental frequency Ω. The fundamental frequency SI may be expressed as a value ranging from −0.5 and 0.5 relative to the center frequency of the central subband. An audio encoder may transmit information regarding the fundamental frequency SI to the decoder 100. The predictor calculator 105 of the decoder 100 may use the three-by-three matrix of FIG. 3 to determine a three-by-three matrix of prediction coefficients by determining the coefficient value 302 for the relative frequency value 303 of the fundamental frequency Ω. This means that the coefficient for a subband predictor 103 using a prediction mask 204 can be determined using only the received information regarding the particular fundamental frequency Ω. In other words, by modeling an input audio signal using e.g. a model of one of more sinusoidal components, a bit-rate efficient description of a subband predictor can be provided.

[0066] FIG. 4 illustrates example noise shaping resulting from in-band subband prediction in a cosine modulated filterbank. The signal model used for performing in-band subband prediction is a second order autoregressive stochastic process with a peaky resonance, as described by a second order differential equation driven by random Gaussian white noise. The curve 401 shows the measured magnitude spectrum for a realization of the process. For this example, the prediction mask 202 of FIG. 2 is applied. That is, the predictor calculator 105 furnishes the subband predictor 103 for a given target subband 221 based on previous subband samples 222 in the same subband only. Replacing the inverse quantizer 101 by a Gaussian white noise generator leads to a synthesized magnitude spectrum 402. As can be seen, strong alias artifacts occur in the synthesis, as the synthesized spectrum 402 comprises peaks which do not coincide with the original spectrum 401.

[0067] FIG. 5 illustrates the example noise shaping resulting from cross-band subband prediction. The setting is the same as that of FIG. 4, except for the fact that the prediction mask 203 is applied. Hence, calculator 105 furnishes the predictor 103 for a given target subband 231 based on previous subband samples 232 in the target subband and in its two adjacent subbands. As it can be seen from FIG. 5, the spectrum 502 of the synthesized signal substantially coincides with the spectrum 501 of the original signal, i.e. the alias problems are substantially suppressed when using cross-band subband prediction.

[0068] As such, FIGS. 4 and 5 illustrate that when using cross-band subband prediction, i.e. when predicting a subband sample based on previous subband samples of one or more adjacent subbands, aliasing artifacts caused by subband prediction can be reduced. As a result, subband prediction may also be applied in the context of low bit rate audio encoders without the risk of causing audible aliasing artifacts. The use of cross-band subband prediction typically increases the number of prediction coefficients. However, as shown in the context of FIG. 3, the use of models for the input audio signal (e.g. the use of a sinusoidal model or a periodic model) allows for an efficient description of the subband predictor, thereby enabling the use of cross-band subband prediction for low bit rate audio coders.

[0069] In the following, a description of the principles of model based prediction in a critically sampled filterbank will be outlined with reference to FIGS. 1-6, and by adding appropriate mathematical terminology.

[0070] A possible signal model underlying linear prediction is that of a zero-mean weakly stationary stochastic process x(t) whose statistics is determined by its autocorrelation function r(τ)=E{x(t)x(t−τ)}. As a good model for the critically sampled filterbanks to be considered here, one lets {w.sub.a:α∈A) be a collection of real valued synthesis waveforms w.sub.a(t) constituting an orthonormal basis. In other words, the filterbank may be represented by the waveforms {w.sub.α:α∈A}. Subband samples of a time domain signal s(t) are obtained by inner products

[00001] $\begin{matrix} .Math. s, w_{α} .Math. = \overset{\infty}{\int_{- \infty}} s (t) w_{α} (t) dt, & (1) \end{matrix}$

and the signal is recovered by

[00002] $\begin{matrix} s (t) = \underset{α \in A}{.Math.} .Math. s, w_{α} .Math. w_{α} (t), & (2) \end{matrix}$

[0071] The subband samples custom-character x, w.sub.α of the process x(t) are random variables, whose covariance matrix R.sub.αβ is determined by the autocorrelation function r(τ) as follows

R.sub.αβ=E{ custom-character x,w.sub.αx,w.sub.β}=W.sub.αβ,r, (3)

where W.sub.αβ(τ) is the cross correlation of two synthesis waveforms

[00003] $\begin{matrix} W_{α β} (τ) = \overset{\infty}{\int_{- \infty}} w_{α} (t) w_{β} (t - τ) dt . & (4) \end{matrix}$

[0072] A linear prediction of the subband sample custom-character x,w.sub.a from a collection or decoded subband samples {x,w.sub.β: βE B} is defined by

[00004] $\begin{matrix} \underset{β \in B}{.Math.} c_{β} .Math. x, w_{β} .Math. . & (5) \end{matrix}$

[0073] In equation (5), the set B defines the source subband samples, i.e. the set B defines the prediction mask support. The mean value of the squared prediction error is given by

[00005] $\begin{matrix} E {{(\underset{β \in B}{.Math.} c_{β} .Math. x, w_{β} .Math. - .Math. x, w_{α} .Math.)}^{2}} = \underset{β, γ \in B}{.Math.} c_{γ} R_{γβ} c_{β} - 2 \underset{β \in B}{.Math.} R_{α β} c_{β} + R_{α α}, & (6) \end{matrix}$

and the least mean square error (MSE) solution is obtained by solving the normal equations for the prediction coefficients c.sub.β,

[00006] $\begin{matrix} \underset{β \in B}{.Math.} R_{γβ} c_{β} = R_{γ a}, γ \in B . & (7) \end{matrix}$

[0074] When the prediction coefficients satisfy equation (7), the right hand side of equation (6) reduces to R.sub.αα−Σ.sub.βR.sub.αβc.sub.β. The normal equations (7) may be solved in an efficient manner using e.g. the Levinson-Durbin algorithm.

[0075] It is proposed in the present document to transmit a parametric representation of a signal model from which the prediction coefficients {c.sub.β:β∈B} can be derived in the predictor calculator 105. For example, the signal model may provide a parametric representation of the autocorrelation function r(τ) of the signal model. The decoder 100 may derive the autocorrelation function r(τ) using the received parametric representation and may combine the autocorrelation function r(τ) with the synthesis waveform cross correlation W.sub.αβ(τ) in order to derive the covariance matrix entries required for the normal equations (7). These equations may then be solved to obtain the prediction coefficients.

[0076] In other words, a to-be-encoded input audio signal may be modeled by a process x(t) which can be described using a limited number of model parameters. In particular, the modeling process x(t) may be such that its autocorrelation function r(τ)=E{x(t)x(t−τ)} can be described using a limited number of parameters. The limited number of parameters for describing the autocorrelation function r(τ) may be transmitted to the decoder 100. The predictor calculator 105 of the decoder 100 may determine the autocorrelation function r(τ) from the received parameters and may use equation (3) to determine the covariance matrix R.sub.αβ of the subband signals from which the normal equation (7) can be determined. The normal equation (7) can then be solved by the predictor calculator 105, thereby yielding the prediction coefficients c.sub.β.

[0077] In the following, example signal models are described which may be used to apply the above described model based prediction scheme in an efficient manner. The signal models described in the following are typically highly relevant for coding audio signals, e.g. for coding speech signals.

[0078] An example of a signal model is given by the sinusoidal process

x(t)=a cos(ξt)+b sin(ξt), (8)

where the random variables a,b are uncorrelated, have zero mean, and variance one. The autocorrelation function of this sinusoidal process is given by

r(τ)=cos(ξτ). (9)

[0079] A generalization of such a sinusoidal process is a multi-sine model comprising a set of (angular) frequencies S, i.e. comprising a plurality of different (angular) frequencies ξ,

[00007] $\begin{matrix} x (t) = \underset{ξ \in S}{.Math.} a_{ξ} \cos (ξ t) + b_{ξ} \sin (ξ t) . & (10) \end{matrix}$

[0080] Assuming that all the random variables a.sub.ξ, b.sub.ξ are pairwise uncorrelated, have zero mean, and variance one, the multi-sine process has the autocorrelation function

[00008] $\begin{matrix} r (τ) = \underset{ξ \in S}{.Math.} \cos (ξ τ) . & (11) \end{matrix}$

[0081] The power spectral density (PSD) of the multi-sine process (which corresponds to the Fourier transform of the autocorrelation function), is the line spectrum

[00009] $\begin{matrix} P (ω) = \frac{1}{2} \underset{ξ \in S}{.Math.} (δ (ω - ξ) + δ (ω + ξ)) . & (12) \end{matrix}$

[0082] Numerical considerations can lead to the replacement of the pure multi-sine process with the autocorrelation function of equation process with a relaxed multi-sine process having the autocorrelation function

[00010] $r (τ) = \exp (- ε .Math. τ .Math.) \underset{ξ \in S}{.Math.} \cos (ξ τ)$

where ε>0 being a relatively small relaxation parameter. The latter model leads to a strictly positive PSD without impulse functions.

[0083] Examples of compact descriptions of the set S of frequencies of a multi-sine model are as follows [0084] 1. A single fundamental frequency Ω: S={Ωv:v=1, 2, . . . } [0085] 2. M fundamental frequencies: Ω.sub.0, Ω.sub.1, . . . , Ω.sub.M-1: S={Ωv::v=1, 2, . . . , k=0, 1, . . . M−1} [0086] 3. A single side band shifted fundamental frequency Ω, θ: S={Ω(v+θ):v=1, 2, . . . } [0087] 4. A slightly inharmonic model: Ω, a: S={Ωv.Math.(1+av.sup.2).sup.1/2: v=1, 2, . . . }, with a describing the inharmonic component of the model.

[0088] As such, a (possibly relaxed) multi-sine model exhibiting a PSD given by equation (12) may be described in an efficient manner using one of the example descriptions listed above. By way of example, a complete set S of frequencies of the line spectrum of equation (12) may be described using only a single fundamental frequency Ω. If the to-be-encoded input audio signal can be well described using a multi-sine model exhibiting a single fundamental frequency Ω, the model based predictor may be described by a single parameter (i.e. by the fundamental frequency Ω), regardless the number of prediction coefficients (i.e. regardless the prediction mask 202, 203, 204, 205) used by the subband predictor 103.

[0089] Case 1 for describing the set S of frequencies yields a process x(t) which models input audio signals with a period T=2π/Ω. Upon inclusion of the zero frequency (DC) contribution with variance ½ to equation (11) and subject to rescaling of the result by the factor 2/T, the autocorrelation function of the periodic model process x(t) may be written as

[00011] $\begin{matrix} r (τ) = \underset{k \in Z}{.Math.} δ (τ - k T) . & (13) \end{matrix}$

[0090] With the definition of a relaxation factor p=exp(−Tε), the autocorrelation function of the relaxed version of the periodic model is given by

[00012] $\begin{matrix} r (τ) = \underset{k \in Z}{.Math.} ρ^{.Math. k .Math.} δ (τ - k T) . & (14) \end{matrix}$

[0091] Equation (14) also corresponds to the autocorrelation function of a process defined by a single delay loop fed with white noise z(t), that is, of the model process

x(t)=ρx(t−T)+√{square root over (1−ρ.sup.2)}z(t). (15)

[0092] This means that the periodic process which exhibits a single fundamental frequency Ω corresponds to a delay in the time domain, with the delay being T=2π/Ω.

[0093] The above mentioned global signal models typically have a flat large scale power spectrum, due to the unit variance assumption of the sinusoidal amplitude parameters a.sub.ξ, b.sub.ξ. It should be noted, however, that the signal models are typically only considered locally for a subset of subbands of a critically sampled filterbank, wherein the filterbank is instrumental in the shaping of the overall spectrum. In other words, for a signal that has a spectral shape with slow variation compared to the subband widths, the flat power spectrum models will provide a good match to the signal, and subsequently, the modelbased predictors will offer adequate levels of prediction gain.

[0094] More generally, the PSD model could be described in terms of standard parameterizations of autoregressive (AR) or autoregressive moving average (ARMA) processes. This would increase the performance of model-based prediction at the possible expense of an increase in descriptive model parameters.

[0095] Another variation is obtained by abandoning the stationarity assumption for the stochastic signal model. The autocorrelation function then becomes a function of two variables r(t,s)=E{x(t)x(s)). For instance, relevant non-stationary sinusoidal models may include amplitude (AM) and frequency modulation (FM).

[0096] Furthermore, a more deterministic signal model may be employed. As will be seen in some of the examples below, the prediction can have a vanishing error in some cases. In such cases, the probabilistic approach can be avoided. When the prediction is perfect for all signals in a model space, there is no need to perform a mean value of prediction performance by means of a probability measure on the considered model space.

[0097] In the following, various aspects regarding modulated filterbanks are described. In particular, aspects are described which have an influence on the determination of the covariance matrix, thereby providing efficient means for determining the prediction coefficients of a subband predictor.

[0098] A modulated filterbank may be described as having a two-dimensional index set of synthesis waveforms α=(n, k) where n=0, 1, . . . is the subband index (frequency band) and where k∈Z is the subband sample index (time slot). For ease of exposition, it is assumed that the synthesis waveforms are given in continuous time and are normalized to a unit time stride,

[00013] $\begin{matrix} w_{n, k} (t) = u_{n} (t - k), & (16) \end{matrix}$ $where$ $\begin{matrix} u_{n} (t) = v (t) \cos [π (n + \frac{1}{2}) (t + \frac{1}{2})], & (17) \end{matrix}$

[0099] in case of a cosine modulated filterbank. It is assumed that the window function v(t) is real valued and even. Up to minor variations of the modulation rule, this covers a range of highly relevant cases such as MDCT (Modified Discrete Cosine Transform), QMF (Quadrature Mirror Filter), and ELT (Extended Lapped Transforms) with L subbands upon sampling at a time step 1/L. The window is supposed to be of finite duration or length with support included in the interval [−K/2, K/2], where K is the overlap factor of the overlapped transform and where K indicates the length of the window function.

[0100] Due to the shift invariant structure, one finds that the cross correlation function of the synthesis waveform (as defined in equation (4)) can be written as

[00014] $\begin{matrix} W_{n, k, m, l} (τ) = \underset{- \infty}{\int^{\infty}} w_{n, k} (t) w_{m, l} (t - τ) dt = \underset{- \infty}{\int^{\infty}} u_{n} (t) u_{m} (t - l + k - τ) dt . & (18) \end{matrix}$

[0101] That is, w.sub.n,k,m,l(τ)=U.sub.n,m(τ−l+k), with the definition u.sub.n,m(τ)=W.sub.n,0,m,0(τ). The modulation structure (17) allows for further expansion into

[00015] $\begin{matrix} U_{n, m} (τ) = \frac{1}{2} κ_{n - m} (τ) \cos \frac{π}{2} [(n + m + 1) τ + (n - m)] + \frac{1}{2} κ_{n + m + 1} (τ) \cos \frac{π}{2} [(n - m) τ + (n + m + 1)] . & (19) \end{matrix}$

where the kernel function κ.sub.v represents a sampling with the filterbank subband step in the frequency variable of the Wigner-Ville distribution of the filterbank window

[00016] $\begin{matrix} κ_{ν} (τ) = \underset{- \infty}{\int^{\infty}} v (t + \frac{τ}{2}) v (t - \frac{τ}{2}) \cos (πν t) dt . & (20) \end{matrix}$

[0102] The kernel is real and even in both v and τ, due to the above mentioned assumptions on the window function v(t). Its Fourier transform is the product of shifted window responses,

[00017] $\begin{matrix} {\hat{κ}}_{ν} (ω) = \hat{v} (ω + \frac{π}{2} ν) \hat{v} (ω - \frac{π}{2} ν) . & (21) \end{matrix}$

[0103] It can be seen from equations (20) and (21) that the kernel κ.sub.v(τ) vanishes for |τ|>K and has a rapid decay as a function of |v| for typical choices of filterbank windows v(t). As a consequence, the second term of equation (19) involving v=n+m+1 can often be neglected except for the lowest subbands.

[0104] For the autocorrelation function r(τ) of a given signal model, the above mentioned formulas can be inserted into the definition of the subband sample covariance matrix given by equation (3). One gets R.sub.n,k,m,l=R.sub.n,m[k−1] with the definition

[00018] $\begin{matrix} R_{n, m} [λ] = \underset{- \infty}{\int^{\infty}} U_{n, m} (τ) r (τ + λ) d τ . & (22) \end{matrix}$

[0105] As a function of the power spectral density P(ω) of the given signal model (which corresponds to the Fourier transform of the autocorrelation function r(τ)), one finds that

[00019] $\begin{matrix} R_{n, m} [λ] = \frac{1}{2 π} \underset{- \infty}{\int^{\infty}} {\hat{U}}_{n, m} (ω) P (ω) \exp (- i ωλ) d ω . & (23) \end{matrix}$

[0106] where .Math..sub.n,m(ω) is the Fourier transform of U.sub.n,m(τ), where n, m identify subband indexes, and where λ represents a time slot lag (λ=k−l). The expression of equation (23) may be rewritten as

[00020] $\begin{matrix} R_{n, m} [λ] = \frac{1}{4 π} \underset{- \infty}{\int^{\infty}} {\hat{κ}}_{n - m} (ω - \frac{π}{2} (n + m + 1)) P (ω) \cos (ω λ - \frac{π}{2} (n - m)) d ω + \frac{1}{4 π} \underset{- \infty}{\int^{\infty}} {\hat{κ}}_{n + m + 1} (ω - \frac{π}{2} (n - m)) P (ω) \cos (ω λ - \frac{π}{2} (n + m + 1)) d ω . & (24) \end{matrix}$

[0107] An important observation is that the first term of equation (24) has essentially an invariance property with respect to frequency shifts. If the second term of equation (24) is neglected and P(ω) is shifted by an integer v times the subband spacing π to P(ω−πv), one finds a corresponding shift in the covariances=R.sub.n,m[λ]=±R.sub.n-v,m-v[λ], where the sign depends on the (integer) values of the time lag λ. This reflects the advantage of using a filterbank with a modulation structure, as compared to the general filter bank case.

[0108] Equation (24) provides an efficient means for determining the matrix coefficients of the subband sample covariance matrix when knowing the PSD of the underlying signal model. By way of example, in case of a sinusoidal model based prediction scheme which makes use of a signal model x(t) comprising a single sinusoid at the (angular) frequency ξ, the PSD is given by

[00021] $P (ω) = \frac{1}{2} (δ (ω - ξ) + δ (ω + ξ)) .$

Insetting P(ω) into equation (24) gives four terms of which three can be neglected under the assumption that n+m+1 is large. The remaining term becomes

[00022] $\begin{matrix} \begin{matrix} R_{n, m} [λ] \approx \frac{1}{8 π} {\hat{κ}}_{n - m} (ξ - \frac{π}{2} (n + m + 1)) \cos (ξ λ - \frac{π}{2} (n - m)) \\ = \frac{1}{8 π} \hat{v} (ξ - π (n + \frac{1}{2})) \hat{v} (ξ - π (m + \frac{1}{2})) \cos (ξ λ - \frac{π}{2} (n - m)) . \end{matrix} & (25) \end{matrix}$

[0109] Equation (25) provides an efficient means for determining the subband covariance matrix R.sub.n,m. A subband sample custom-character x,w.sub.p,0) can be reliably predicted by a collection of surrounding subband samples {x, w.sub.n,k(n, k)∈B} which are assumed to be influenced significantly by the considered frequency. The absolute frequency ξ can be expressed in relative terms, relative to the center frequency

[00023] $π (p + \frac{1}{2})$

of a subband, as

[00024] $ξ = π (p + \frac{1}{2} + f),$

where p is the subband index of the subband which comprises the frequency ξ, and where f is a normalized frequency parameter which takes on values between −0.5 and +0.5 and which indicates the position of the frequency ξ relative of the center frequency of the subband p. Having determined the subband covariance matrix the predictor coefficients c.sub.m[l] which are applied to a subband sample in subband m at sample index l for estimating a subband sample in subband n at sample index k are found by solving the normal equations (7), which for the case at hand can be written

[00025] $\begin{matrix} \underset{(m, l) \in B}{.Math.} R_{n, m} [k - l] c_{m} [l] = R_{n, p} [k], (n, k) \in B . & (26) \end{matrix}$

[0110] In equation (26), the set B describes the prediction mask support as illustrated e.g. in FIG. 2. In other words, the set B identifies the subbands m and the sample indexes l which are used to predict a target sample.

[0111] In the following, solutions of the normal equations (26) for different prediction mask supports (as shown in FIG. 2) are provided in an exemplary manner. The example of a causal second order in-band predictor is obtained by selecting the prediction mask support B={(p,−1), (p,−2)}. This prediction mask support corresponds to the prediction mask 202 of FIG. 2. The normal equations (26) for this two tap prediction, using the approximation of equation (25), become

[00026] $\begin{matrix} {\hat{v} (ξ - π (p + \frac{1}{2}))}^{2} \underset{l = - 1, - 2}{.Math.} \cos (ξ (k - l)) c_{p} [l] = {\hat{v} (ξ - π (p + \frac{1}{2}))}^{2} \cos (- ξ k), & (27) \end{matrix}$ $k = - 1, - 2.$

[0112] A solution to equation (27) is given by c.sub.p[−1]=2 cos(ξ), c.sub.p[−2]=−1 and it is unique as long the frequency

[00027] $ξ = π (p + \frac{1}{2} + f)$

is not chosen such that {circumflex over (v)}(f)=0. One finds that the mean value of the squared prediction error according to equation (6) vanishes. Consequently, the sinusoidal prediction is perfect, up to the approximation of equation (25). The invariance property to frequency shifts is illustrated here by the fact that using the definition

[00028] $ξ = π (p + \frac{1}{2} + f),$

the prediction coefficient c.sub.p[−1] can be rewritten in terms of the normalized frequency f, as c.sub.p[−1]=−2(−1).sup.p sin(πf). This means that the prediction coefficients are only dependent on the normalized frequency f within a particular subband. The absolute values of the prediction coefficients are, however, independent of the subband index p.

[0113] As discussed above for FIG. 4, in-band prediction has certain shortcomings with respect to alias artifacts in noise shaping. The next example relates to the improved behavior as illustrated by FIG. 5. A causal cross-band prediction as taught in the present document is obtained by selecting the prediction mask support B={(p−1,−1), (p,−1), (p+1,−1)}, which requires only one earlier time slot instead of two, and which performs a noise shaping with less alias frequency contributions than the classical prediction mask 202 of the first example. The prediction mask support B={(p−1,−1), (p,−1), (p+1,−1)} corresponds to the prediction mask 203 of FIG. 2. The normal equations (26) based on the approximation of equation (25) reduce in this case to two equations for the three unknown coefficients c.sub.m[−1], m=p−1, p,p+1,

[00029] $\begin{matrix} {\begin{matrix} \hat{v} (π f) c_{p} [- 1] = {(- 1)}^{p + 1} \hat{v} (π f) \sin (π f) \\ \begin{matrix} \hat{v} (π (f + 1)) c_{p - 1} [- 1] - \hat{v} (π (f - 1)) c_{p + 1} [- 1] = \\ {(- 1)}^{p} \hat{v} (π f) \cos (π f) \end{matrix} \end{matrix}} . & (28) \end{matrix}$

[0114] One finds that any solution to equations (28) leads to a vanishing mean value of the squared prediction error according to equation (6). A possible strategy to select one solution among the infinite number of solutions to equations (28) is to minimize the sum of squares of the prediction coefficients. This leads to the coefficients given by

[00030] $\begin{matrix} {\begin{matrix} c_{p - 1} [- 1] = \frac{{(- 1)}^{p} \hat{v} (π f) \hat{v} (π (f + 1)) \cos (π f)}{{\hat{v} (π (f - 1))}^{2} + {\hat{v} (π (f + 1))}^{2}} \\ c_{p} [- 1] = {(- 1)}^{p + 1} \sin (π f) \\ c_{p + 1} [- 1] = \frac{{(- 1)}^{p + 1} \hat{v} (π f) \hat{v} (π (f - 1)) \cos (π f)}{{\hat{v} (π (f - 1))}^{2} + {\hat{v} (π (f + 1))}^{2}} \end{matrix}} . & (29) \end{matrix}$

[0115] It is clear from the formulas (29) that the prediction coefficients only depend on the normalized frequency f with respect to the midpoint of the target subband p, and further depend on the parity of the target subband p.

[0116] By using the same prediction mask support B={(p−1,−1), (p,−1), (p+1,−1)} to predict the three subband samples custom-character x, w.sub.m,0 for m=p−1, p, p+1, as illustrated by the prediction mask 204 of FIG. 2, a 3×3 prediction matrix is obtained. Upon introduction of a more natural strategy for avoiding the ambiguity in the normal equations, namely by inserting the relaxed sinusoidal model r(τ)=exp (−ε|τ|)cos(ξτ) corresponding to p(ω)=ε((ε.sup.2+(ω−ξ).sup.2).sup.−+(ε.sup.2+(ω+ξ).sup.2).sup.−1), numerical computations lead to the 3×3 prediction matrix elements of FIG. 3. The prediction matrix elements are shown as function of the normalized frequency

[00031] $f \in [- \frac{1}{2}, \frac{1}{2}]$

in the case of an overlap K=2 with a sinusoidal window function v(t)=cos(πt/2) and in case of an odd subband p.

[0117] As such, it has been shown that signal models x(t) may be used to describe underlying characteristics of the to-be-encoded input audio signal. Parameters which describe the autocorrelation function r(τ) may be transmitted to a decoder 100, thereby enabling the decoder 100 to calculate the predictor from the transmitted parameters and from the knowledge of the signal model x(t). It has been shown that for modulated filterbanks, efficient means for determining the subband covariance matrix of the signal model and for solving the normal equations to determine the predictor coefficients can be derived. In particular, it has been shown that the resulting predictor coefficients are invariant to subband shifts and are typically only dependent on a normalized frequency relative to a particular subband. As a result, pre-determined look-up tables (as illustrated e.g. in FIG. 3) can be provided which allow for the determination of predictor coefficients knowing a normalized frequency f which is independent (apart from a parity value) of the subband index p for which the predictor coefficients are determined

[0118] In the following, periodic model based prediction, e.g. using a single fundamental frequency Ω, is described in further details. The autocorrelation function r(τ) of such a periodic model is given by equation (13). The equivalent PSD or line spectrum is given by

[00032] $\begin{matrix} P (ω) = Ω \underset{q \in z}{.Math.} δ (ω - q Ω) . & (30) \end{matrix}$

[0119] When the period T of the periodic model is sufficiently small, e.g. T≤1, the fundamental frequency Ω=2π/T is sufficiently large to allow for the application of a sinusoidal model as derived above using the partial frequency ξ=qΩ closest to the center frequency

[00033] $π (p + \frac{1}{2})$

of the subband p of the target subband sample which is to be predicted. This means that periodic signals having a small period T, i.e. a period which is small with respect to the time stride of the filterbank, can be well modeled and predicted using the sinusoidal model described above.

[0120] When the period T is sufficiently large compared to the duration K of the filterbank window v(t), the predictor reduces to an approximation of a delay by T. As will be shown, the coefficients of this predictor can be read directly from the waveform cross correlation function given by equation (19).

[0121] Insertion of the model according to equation (13) into equation (22) leads to

[00034] $\begin{matrix} R_{n, m} [λ] = \underset{q \in Z}{.Math.} U_{n, m} (q T - λ), & (31) \end{matrix}$

[0122] An important observation is that if T≥2K, then at most one term of equation (31) is nonzero for each λ since U.sub.n,m(τ)=0 for |τ|>K. By choosing a prediction mask support B=I×J with time slot diameter D=|J|≤T−K one observes that (n, k), (m, l)∈B implies |k−l|≤T−K, and therefore the single term of equation (31) is that for q=0. It follows that R.sub.n,m[k−1]=U.sub.n,m(k−l), which is the inner product of orthogonal waveforms and which vanishes unless both n=m and k=l. All in all, the normal equations (7) become

c.sub.n[k]=R.sub.n,p[k],(n,k)∈B. (32)

[0123] The prediction mask support may be chosen to be centered around k=k.sub.0≈−T, in which case the right hand side of equation (32) has its single contribution from q=−1. Then the coefficients are given by

c.sub.n[k]=U.sub.n,p[−k−T],(n,k)∈B, (33)

[0124] wherein the explicit expression from equation (19) can be inserted. The geometry of the prediction mask support for this case could have the appearance of the prediction mask support of the prediction mask 205 of FIG. 2. The mean value of the squared prediction error given by equation (6) is equal to the squared norm of the projection of u.sub.p(t+T) onto the space spanned by the complement of the approximating waveforms w.sub.m,l(t), (m, l).Math.B.

[0125] In view of the above, it is taught by the present document that the subband sample custom-character x, w.sub.p,0 (from subband p and at time index 0) can be predicted by using a suitable prediction mask support B centered around (p, −T) with time diameter approximately equal to T. The normal equations may be solved for each value of T and p. In other words, for each periodicity T of an input audio signal and for each subband p, the prediction coefficients for a given prediction mask support B may be determined using the normal equations (33).

[0126] With a large number of subbands p and a wide range of periods T, a direct tabulation of all predictor coefficients is not practical. But in a similar manner to the sinusoidal model, the modulation structure of the filterbank offers a significant reduction of the necessary table size, through the invariance property with respect to frequency shifts. It will typically be sufficient to study the shifted harmonic model with shift parameter −½<θ≤½ centered around the center of a subband p, i.e. centered around

[00035] $π (p + \frac{1}{2}),$

defined by the subset S(θ) of positive frequencies among the collection of frequencies

[00036] $π (p + \frac{1}{2}) + (q + θ) Ω,$

q∈Z,

[00037] $\begin{matrix} P (ω) = Ω \underset{ξ \in S (θ)}{.Math.} (δ (ω - ξ) + δ (ω + ξ)) . & (34) \end{matrix}$

[0127] Indeed, given T and a sufficiently large subband index p, the periodic model according to equation (30) can be recovered with good approximation by the shifted model according to equation (34) by a suitable choice of the shift parameter θ. Insertion of equation (34) into equation (24) with n=p+v and m=p+p (wherein v and μ define the subband indexes around subband p of the prediction mask support) and manipulations based on Fourier analysis leads to the following expression for the covariance matrix,

[00038] $\begin{matrix} R_{p + v, p + μ} [λ] \approx \frac{{(- 1)}^{p λ}}{2} \underset{l \in Z}{.Math.} κ_{v - μ} (Tl - λ) \cos (2 π l θ + \frac{π}{2} ((v + μ) (λ - Tl) + λ - v + μ)) . & (35) \end{matrix}$

[0128] As can be seen, expression (35) depends on the target subband index p only through the factor (−1).sup.pλ. For the case of a large period T and a small temporal lag λ, only the term for l=0 contributes to expression (35), and one finds again that the covariance matrix is the identity matrix. The right hand side of the normal equations (26) for a suitable prediction mask support B centered around (p,−T) then gives the prediction coefficients directly as

[00039] $\begin{matrix} c_{p + v} [k] = \frac{{(- 1)}^{pk}}{2} κ_{v} (- T - k) \cos (- 2 π θ + \frac{π}{2} (v (k + T) + k - v)), (p + v, k) \in B . & (36) \end{matrix}$

[0129] This recovers the contribution of the first term of equations (19) to (33) with the canonical choice of shift θ=−π(p+½)Ω.

[0130] Equation (36) allows determining the prediction coefficients c.sub.p+v[k] for a subband (p+v) at a time index k, wherein the to-be-predicted sample is a sample from subband p at time index 0. As can be seen from equation (36), the prediction coefficients c.sub.p+v[k] depend on the target subband index p only through the factor (−1).sup.pk which impacts the sign of the prediction coefficient. The absolute value of the prediction coefficient is, however, independent of the target subband index p. On the other hand, the prediction coefficient c.sub.p+v[k] is dependent on the periodicity T and the shift parameter θ. Furthermore, the prediction coefficient c.sub.p+v[k] is dependent on v and k, i.e. on the prediction mask support B, used for predicting the target sample in the target subband p.

[0131] In the present document, it is proposed to provide a look-up table which allows to look-up a set of prediction coefficients c.sub.p+v[k] for a pre-determined prediction mask support B. For a given prediction mask support B, the look-up table provides a set of prediction coefficients c.sub.p+v[k] for a pre-determined set of values of the periodicity T and values of the shift parameter θ. In order to limit the number of look-up table entries, the number of pre-determined values of the periodicity T and the number of pre-determined values of the shift parameter θ should be limited. As can be seen from expression (36), a suitable quantization step size for the pre-determined values of periodicity T and shift parameter θ should be dependent on the periodicity T. In particular, it can be seen that for relatively large periodicities T (relative to the duration K of the window function), relatively large quantization steps for the periodicity T and for the shift parameter θ may be used. On the other extreme, for relatively small periodicities T tending towards zero, only one sinusoidal contribution has to be taken into account, so the periodicity T loses its importance. On the other hand, the formulas for sinusoidal prediction according to equation (29) require the normalized absolute frequency shift

[00040] $f = Ω θ / π = \frac{1}{2} θ / T$

to be slowly varying, so the quantization step size for the shift parameter θ should be scaled based on the periodicity T.

[0132] All in all, it is proposed in the present document to use a uniform quantization of the periodicity T with a fixed step size. The shift parameter θ may also be quantized in a uniform manner, however, with a step size which is proportional to min(T, A), where the value of A depends on the specifics of the filterbank window function. Moreover, for T<2, the range of shift parameters θ may be limited to |θ|≤min(CT,½) for some constant C, reflecting a limit on the absolute frequency shifts f.

[0133] FIG. 6a illustrates an example of a resulting quantization grid in the (T, θ)-plane for A=2. Only in the intermediate range ranging from 0.25≤T≤1.5 the full two-dimensional dependence is considered, whereas the essentially one-dimensional parameterizations as given by equations (29) and equations (36) can be used for the remaining range of interest. In particular, for periodicities T which tend towards zero (e.g. T<0.25) periodic model based prediction substantially corresponds to sinusoidal model based prediction, and the prediction coefficients may be determined using formulas (29). On the other hand, for periodicities T which substantially exceed the window duration K (e.g. T>1.5) the set of prediction coefficients c.sub.p+v[k] using periodic model based prediction may be determined using equation (36). This equation can be re-interpreted by means of the substitution

[00041] $θ = φ + \frac{1}{4} Tv .$

One finds that

[00042] $\begin{matrix} c_{p + v} [k] = \frac{{(- 1)}^{pk}}{2} κ_{v} (- T - k) \cos (- 2 π φ + \frac{π}{2} ((v + 1) k - v)), (p + v, k) \in B . & (37) \end{matrix}$

[0134] By giving φ the role given to the parameter θ in the tabulation, an essentially separable structure is obtained in the equivalent (T, φ)-plane. Up to sign changes depending on subband and time slot indices, the dependence on T is contained in a first slowly varying factor, and the dependence on φ is contained in 1-periodic second factor in equation (37). One can interpret the modified offset parameter φ as the shift of the harmonic series in units of the fundamental frequency as measured from the midpoint of the midpoints of the source and target bins. It is advantageous to maintain this modified parameterization (T, φ) for all values of periodicities T since symmetries in equation (37) that are apparent with respect to simultaneous sign changes of φ and v will hold in general and may be exploited in order to reduce table sizes.

[0135] As indicated above FIG. 6a depicts a two-dimensional quantization grid underlying the tabulated data for a periodic model based predictor calculation in a cosine modulated filterbank. The signal model is that of a signal with period T 602, measured in units of the filterbank time step. Equivalently, the model comprises the frequency lines of the integer multiples, also known as partials, of the fundamental frequency corresponding to the period T. For each target subband, the shift parameter θ 601 indicates the distance of the closest partial to the center frequency measured in units of the fundamental frequency Ω. The shift to parameter θ 601 has a value between −0.5 and 0.5. The black crosses 603 of FIG. 6a illustrate an appropriate density of quantization points for the tabulation of predictors with a high prediction gain based on the periodic model. For large periods T (e.g. T>2), the grid is uniform. An increased density in the shift parameter θ is typically required as the period T decreases. However, in the region outside of the lines 604, the distance θ is greater than one frequency bin of the filterbank, so most grid points in this region can be neglected. The polygon 605 delimits a region which suffices for a full tabulation. In addition to the sloped lines slightly outside of the lines 604, borders at T=0.25 and T=1.5 are introduced. This is enabled by the fact that small periods 602 can be treated as separate sinusoids, and that predictors for large periods 602 can be approximated by essentially one-dimensional tables depending mainly on the shift parameter θ, (or on the modified shift parameter φ). For the embodiment illustrated in FIG. 6a, the prediction mask support is typically similar to the prediction mask 205 of FIG. 2 for large periods T.

[0136] FIG. 6b illustrates periodic model based prediction in the case of relatively large periods T and in the case of relative small periods T. It can be seen from the upper diagram that for large periods T, i.e. for relatively small fundamental frequencies Ω 613, the window function 612 of the filterbank captures a relatively large number of lines or Dirac pulses 616 of the PSD of the periodic signal. The Dirac pulses 616 are located at frequencies 610 ω=qΩ, with q∈ custom-character . The center frequencies of the subbands of the filterbank are located at the frequencies

[00043] $ω = π (p + \frac{1}{2}),$

with p∈ custom-character . For a given subband p, the frequency location of the pulse 616 with frequency ω=qΩ of closest to the center frequency of the given subband

[00044] $ω = π (p + \frac{1}{2})$

may be described in relative terms as

[00045] $q Ω = π (p + \frac{1}{2}) + Θ Ω,$

with the shift parameter Θ ranging from −0.5 to +0.5. As such, the term ΘΩ reflects the distance (in frequency) from the center frequency

[00046] $ω = π (p + \frac{1}{2})$

to the nearest frequency component 616 of the harmonic model. This is illustrated in the upper diagram of FIG. 6b where the center frequency 617 is

[00047] $ω = π (p + \frac{1}{2})$

and where the distance 618 ΘΩ is illustrated for the case of a relatively large period T. It can be seen that the shift parameter Θ allows describing the entire harmonic series viewed from the perspective of the center of the subband p.

[0137] The lower diagram of FIG. 6b illustrates the case for relatively small periods T, i.e. for relatively large fundamental frequencies Ω 623, notably fundamental frequencies 623 which are greater than the width of the window 612. It can be seen that in such cases, a window function 612 may only comprise a single pulse 626 of the periodic signal, such that the signal may be viewed as a sinusoidal signal within the window 612. This means that for relatively small periods T, the periodic model based prediction scheme converges towards a sinusoidal modal based prediction scheme.

[0138] FIG. 6b also illustrates example prediction masks 611, 621 which may be used for the periodic model based prediction scheme and for the sinusoidal model based prediction scheme, respectively. The prediction mask 611 used for the periodic model based prediction scheme may correspond to the prediction mask 205 of FIG. 2 and may comprise the prediction mask support 614 for estimating the target subband sample 615. The prediction mask 621 used for the sinusoidal model based prediction scheme may correspond to the prediction mask 203 of FIG. 2 and may comprise the prediction mask support 624 for estimating the target subband sample 625.

[0139] FIG. 7a illustrates an example encoding method 700 which involves model based subband prediction using a periodic model (comprising e.g. a single fundamental frequency Ω). A frame of an input audio signal is considered. For this frame a periodicity T or a fundamental frequency Ω may be determined (step 701). The audio encoder may comprise the elements of the decoder 100 illustrated in FIG. 1, in particular, the audio encoder may comprise a predictor calculator 105 and a subband predictor 103. The periodicity T or the fundamental frequency Ω may be determined such that the mean value of the squared prediction error subband signals 111 according to equation (6) is reduced (e.g. minimized). By way of example, the audio encoder may apply a brute force approach which determines the prediction error subband signals 111 using different fundamental frequencies Ω and which determines the fundamental frequency Ω for which the mean value of the squared prediction error subband signals 111 is reduced (e.g. minimized). The method proceeds in quantizing the resulting prediction error subband signals 111 (step 702). Furthermore, the method comprises the step of generating 703 a bitstream comprising information indicative of the determined fundamental frequency Ω and of the quantized prediction error subband signals 111.

[0140] When determining the fundamental frequency Ω in step 701, the audio encoder may make use of the equations (36) and/or (29), in order to determine the prediction coefficients for a particular fundamental frequency Ω. The set of possible fundamental frequencies Ω may be limited by the number of bits which are available for the transmission of the information indicative of the determined fundamental frequency Ω.

[0141] It should be noted that the audio coding system may use a pre-determined model (e.g. a periodic model comprising a single fundamental frequency Ω or any other of the models provided in the present document) and/or a pre-determined prediction mask 202, 203, 204, 205. On the other hand, the audio coding system may be provided with further degrees of freedom by enabling the audio encoder to determine an appropriate model and/or an appropriate prediction mask for a to-be-encoded audio signal. The information regarding the selected model and/or the selected prediction mask is then encoded into the bit stream and provided to the corresponding decoder 100.

[0142] FIG. 7b illustrates an example method 710 for decoding an audio signal which has been encoded using model based prediction. It is assumed that the decoder 100 is aware of the signal model and the prediction mask used by the encoder (either via the received bit stream or due to pre-determined settings). Furthermore, it is assumed for illustrative purposes that a periodic prediction model has been used. The decoder 100 extracts information regarding the fundamental frequency Ω from the received bit stream (step 711). Using the information regarding the fundamental frequency Ω, the decoder 100 may determine the periodicity T. The fundamental frequency Ω and/or the periodicity T may be used to determine a set of prediction coefficients for the different subband predictors (step 712). The subband predictors may be used to determine estimated subband signals (step 713) which are combined (step 714) with the dequantized prediction error subband signals 111 to yield the decoded subband signals 113. The decoded subband signals 113 may be filtered (step 715) using a synthesis filterbank 102, thereby yielding the decoded time domain audio signal 114.

[0143] The predictor calculator 105 may make use of the equations (36) and/or (29) for determining the prediction coefficients of the subband predictors 103 based on the received information regarding the fundamental frequency Ω (step 712). This may be performed in an efficient manner using a look-up table as illustrated in FIGS. 6a and 3. By way of example, the predictor calculator 105 may determine the periodicity T and determine whether the periodicity lies below a pre-determined lower threshold (e.g. T=0.25). If this is the case, a sinusoidal model based prediction scheme is used. This means that based on the received fundamental frequency Ω, the subbands p is determined which comprises a multiple ω=qΩ, with q∈ custom-character , of the fundamental frequency. Then the normalized frequency f is determined using the relation

[00048] $ξ = π (p + \frac{1}{2} + f),$

where the frequency ξ corresponds to the multiple ω=qΩ of which lies in subband p. The predictor calculator 105 may then use equation (29) or a pre-calculated look-up table to determine the set of prediction coefficients (using e.g. the prediction mask 203 of FIG. 2 or the prediction mask 621 of FIG. 6b). It should be noted that a different set of prediction coefficients may be determined for each subband. However, in case of a sinusoidal model based prediction scheme, a set of prediction coefficients is typically only determined for the subbands p which are significantly affected by a multiple ω=qΩ, with q∈ custom-character , of the fundamental frequency. For the other subbands, no prediction coefficients are determined which means that the estimated subband signals 112 for such other subbands are zero.

[0144] In order to reduce the computation complexity of the decoder 100 (and of the encoder using the same predictor calculator 105), the predictor calculator 105 may make use of a pre-determined look-up table which provides the set of prediction coefficients, subject to values for T and Θ. In particular, the predictor calculator 105 may make use of a plurality of look-up tables for a plurality of different values for T. Each of the plurality of look-up tables provides a different set of prediction coefficients for a plurality of different values of the shift parameter Θ.

[0145] In a practical implementation, a plurality of look-up tables may be provided for different values of the period parameter T. By way of example, look-up tables may be provided for values of Tin the range of 0.25 and 2.5 (as illustrated in FIG. 6a). The look-up tables may be provided for a pre-determined granularity or step size of different period parameters T. In an example implementation, the step size for the normalized period parameter T is 1/16, and different look-up tables for the quantized prediction coefficients are provided for T=8/32 up to T=80/32. Hence, a total of 37 different look-up tables may be provided. Each table may provide the quantized prediction coefficients as a function of the shift parameter Θ or as a function of the modified shift parameter cp. The look-up tables for T=8/32 up to T=80/32 may be used for a range which is augmented by half a step size, i.e. [9/32, 81/32]. For a given periodicity which differs from the available periodicities, for which a look-up tables has been defined, the look-up table for the nearest available periodicity may be used.

[0146] As outlined above, for long periods T (e.g. for periods T which exceed the period for which a look-up table is defined), equation (36) may be used. Alternatively, for periods T which exceed the periods for which look-up tables have been defined, e.g. for periods T>81/32, the period T may be separated into an integer delay Ti and a residual delay T.sub.r, such that T=T.sub.i+T.sub.r. The separation may be such that the residual delay T.sub.r lies within the interval for which equation (36) is applicable and for which look-up tables are available, e.g. within the interval [1.5, 2.5] or [49/32, 81/32] for the example above. By doing this, the prediction coefficients can be determined using the loop-up table for the residual delay T.sub.r and the subband predictor 103 may operate on a subband buffer 104 which has been delayed by the integer delay T. For example, if the period is T=3.7, the integer delay may be T.sub.i=2, followed by a residual delay of T.sub.r=1.7. The predictor may be applied based on the coefficients for T.sub.r=1.7 on a signal buffer which is delayed by (an additional) T.sub.i=2. The separation approach relies on the reasonable assumption that the extractor approximates a delay by Tin the range of [1.5, 2.5] or [49/32, 81/32]. The advantage of the separation procedure compared to the usage of equation (36) is that the prediction coefficients can be determined based on computationally efficient table look-up operations.

[0147] As outlined above, for short periods (T<0.25) equation (29) may be used to determine the prediction coefficients. Alternatively, it may be beneficial to make use of the (already available) look-up tables (in order to reduce the computational complexity). It is observed that the modified shift parameter y is limited to the range |φ|≤T with a sampling step size of

[00049] $Δ φ = \frac{T}{32} (for T < 0.25, and for C = 1, A = 1 / 2) .$

[0148] It is proposed in the present document to reuse the look-up table for the lowest period T=0.25, by means of a scaling of the modified shift parameter φ with T.sub.l/T, wherein T.sub.l corresponds to the lowest period for which a look-up table is available (e.g. T=0.25). By way of example, with T=0.1 and φ=0.07, the table for T=0.25 may be queried with a rescaled shift parameter

[00050] $φ = (\frac{0.25}{0.1}) .Math. 0.07 = 0.175 .$

By doing this, the prediction coefficients for short periods (e.g. T<0.25) can also be determined in a computationally efficient manner using table look-up operations. Furthermore, the memory requirements for the predictor can be reduced, as the number of look-up tables can be reduced.

[0149] In the present document, a model based subband prediction scheme has been described. The model based subband prediction scheme enables an efficient description of subband predictors, i.e. a description requiring only a relatively low number of bits. As a result of an efficient description for subband predictors, cross-subband prediction schemes may be used which lead to reduced aliasing artifacts. Overall, this allows the provision of low bit rate audio coders using subband prediction.

Model Based Prediction in a Critically Sampled Filterbank

Assignee

Inventors

Cpc classification

Classification Explorer

G10L19/0208

PHYSICS

Classification Explorer

G10L19/26

PHYSICS

Classification Explorer

G10L19/06

PHYSICS

Classification Explorer

G10L19/032

PHYSICS

Classification Explorer

G10L19/093

PHYSICS

Classification Explorer

G10L19/0212

PHYSICS

Classification Explorer

G10L19/265

PHYSICS

Classification Explorer

G06F30/30

PHYSICS

Classification Explorer

G10L19/005

PHYSICS

Classification Explorer

G06F30/327

PHYSICS

International classification

Classification Explorer

G10L19/02

PHYSICS

Classification Explorer

G10L19/093

PHYSICS

Classification Explorer

G06F30/30

PHYSICS

Classification Explorer

G06F30/327

PHYSICS

Classification Explorer

G10L19/005

PHYSICS

Classification Explorer

G10L19/032

PHYSICS

Classification Explorer

G10L19/06

PHYSICS

Classification Explorer

G10L19/26

PHYSICS

Abstract

Claims

Description