TIME-DOMAIN GAIN MODELING IN THE QMF DOMAIN

Abstract

A method of processing audio is provided. The method includes determining modulated filter bank, MFB, domain broad band gains for fading an audio signal in accordance with a time domain target gain, so that application of the broad band gains in the MFB domain emulates application of the target gain in the time domain. Determining the broad band gains includes computing the broad band gains using the target gain, an MFB analysis prototype filter, and an MFB synthesis prototype filter. Also provided are corresponding apparatus, programs, and computer-readable storage media.

Claims

1. A method of processing audio, the method comprising determining modulated filter bank, MFB, domain broad band gains for fading an audio signal in accordance with a time domain target gain, so that application of the broad band gains in the MFB domain emulates application of the target gain in the time domain, wherein determining the broad band gains includes computing the broad band gains using the target gain, an MFB analysis prototype filter, and an MFB synthesis prototype filter.

2. The method according to claim 1, wherein a respective broad band gain is computed for each of a plurality of MFB analysis time slots.

3. The method according to any one of the preceding claims, wherein computing the broad band gains includes optimizing the broad band gains by computing a least squares solution.

4. The method according to claim 1, wherein determining the broad band gains includes: determining, for each of a plurality of MFB analysis time slots and for each of a plurality of frequency bands, a respective MFB analysis signal, based on an input training signal and the MFB analysis prototype filter; determining, for each of the plurality of MFB analysis time slots, a respective MFB synthesis signal, based on the MFB analysis signals in the respective MFB analysis time slot and the MFB synthesis prototype filter; and computing the broad band gains across MFB analysis time slots based on the MFB synthesis signals and the target gain.

5. The method according to claim 4, wherein computing the broad band gains includes optimizing the broad band gains by computing a least squares solution.

6. The method according to claim 5, wherein the least squares solution minimizes an error between samples of a first audio signal and samples of a second audio signal, the first audio signal obtainable by MFB analysis of the training signal followed by MFB synthesis, overlap add, and application of the target gain, or by application of the target gain and delaying by a processing delay of MFB analysis and MFB synthesis, and the second audio signal obtainable by applying, in each MFB analysis time slot, a respective broad band gain to a respective MFB synthesis signal, and by summing contributions from all MFB analysis time slots.

7. The method according to claim 5, wherein the least squares solution is a solution to an objective function based on a transform matrix T.sub.1 that depends on the plurality of MFB synthesis signals and a target vector t.sub.1 that depends on the target gain.

8. The method according to claim 7, wherein the transform matrix T.sub.1 is given by T.sub.1=[w.sub.0(n), w.sub.1(n), . . . , w.sub.K1(n)], where K is the number of MFB analysis time slots and n indicates a sample number, and the target vector t.sub.1 is given by t.sub.1=x.sub.2(n) g(nD.sub.P), where x.sub.2(n) is a time-domain signal obtainable by MFB analysis of the training signal followed by MFB synthesis and overlap add, and D.sub.P is a delay; and wherein the least squares solution solves the equation T.sub.1G=t.sub.1, where G is a broad band gain vector given by G=[G.sub.0, G.sub.1, . . . , G.sub.K1].sup.T, with .sup.T indicating the transpose.

9. The method according to claim 8, wherein the least squares solution for the broad band gain vector G is given by G=(T.sub.1.sup.TT.sub.1).sup.1(T.sub.1.sup.Tt.sub.1), with .sup.1 indicating the inverse.

10. The method according to claim 4, wherein the training signal is a random signal or a constant signal.

11. The method according to claim 4, wherein computing the broad band gains is performed iteratively, each iteration after the first iteration being computed with a respective modified training signal or a respective different training signal.

12. The method according to claim 1, wherein determining the broad band gains includes: determining an MFB interpolation prototype filter based on the MFB analysis prototype filter and the MFB synthesis prototype filter; and computing the broad band gains across MFB analysis time slots based on the MFB interpolation prototype filter and the target gain.

13. The method according to claim 12, wherein the MFB interpolation prototype filter is determined as a product of one of the MFB analysis prototype filter and the MFB synthesis prototype filter and a mirrored and shifted version of the other one of the MFB analysis prototype filter and the MFB synthesis prototype filter.

14. The method according to claim 12, wherein computing the broad band gains includes optimizing the broad band gains by computing a least squares solution.

15. The method according to claim 14, wherein the least squares solution is a solution to an objective function based on a transform matrix T.sub.2 and a target vector t.sub.2 that depends on the target gain.

16. The method according to claim 14, wherein the least squares solution is a solution to an objective function based on a transform matrix T.sub.2 that depends on the MFB interpolation prototype filter and a target vector t.sub.2 that depends on the target gain.

17. The method according to claim 15, wherein the transform matrix T.sub.2 is a matrix of shifted versions of the MFB interpolation prototype filter, each associated with a particular MFB analysis time slot.

18. The method according to claim 15, wherein the transform matrix T.sub.2 is given by T.sub.2=[p.sub.i(n), p.sub.i(nS), . . . , p.sub.i(n(K1)S)], where p.sub.i is the MFB interpolation prototype filter, K is the number of MFB analysis time slots, n indicates a sample number, and S is a slot length of the MFB analysis time slots, and the target vector t.sub.2 is given by t.sub.2=g(nD).sub.k=0.sup.K1p.sub.i(nKS), where g is the target gain; and wherein the least squares solution solves the equation T.sub.2G=t.sub.2, where G is a broad band gain vector given by G=[G.sub.0, G.sub.1, . . . , G.sub.K1].sup.T, with .sup.T indicating the transpose.

19. The method according to claim 18, wherein the least squares solution for the broad band gain vector G is given by G=(T.sub.2.sup.TT.sub.2).sup.1(T.sub.2.sup.Tt.sub.2), with .sup.1 indicating the inverse.

20. The method according to claim 18, wherein the MFB interpolation prototype filter p.sub.i is given by p.sub.i(n)=p.sub.S(n)p.sub.A(Dn), where p.sub.A is the MFB analysis prototype filter, p.sub.S is the MFB synthesis prototype filter, and D+1 is an effective length of the MFB interpolation prototype filter p.sub.i.

21. The method according to claim 1, further comprising determining a set of MFB analysis time slots by identifying a non-constant gain function section of the target gain, encapsulated by time samples, and determining associated time slots based on the non-constant gain function section.

22. The method according to claim 1, further comprising: applying the determined broad band gains in the MFB domain; generating a time-domain broad band signal using the determined broad band gains; limiting the determined broad band gains to a range from 0 to 1 inclusive; decoding transformed signals in the MFB domain, including fading an audio signal relating to a current parameter set and/or fading an audio signal relating to a previous parameter set, using the broad band gains per MFB analysis time slot; and wherein the MFB domain is a quadrature mirror filter, QMF, domain.

23-26. (canceled)

27. An apparatus, comprising a processor and a memory coupled to the processor, and storing instructions for the processor, wherein the processor is adapted to carry out the method according to claim 1.

28. A program comprising instructions that, when executed by a processor, cause the processor to carry out the method according to claim 1.

29. A computer-readable storage medium storing the program according to claim 28.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0041] The invention is explained below in an exemplary manner with reference to the accompanying drawings, wherein

[0042] FIG. 1 is a block diagram schematically illustrating an example of a basic operation of gain application in the time domain;

[0043] FIG. 2 is a block diagram schematically illustrating an example of an alternative operation of MFB analysis followed by MFB synthesis, and application of a delayed time-domain gain;

[0044] FIG. 3 is a block diagram schematically illustrating an example of an operation of gain application in the MFB domain according to embodiments of the disclosure;

[0045] FIG. 4 is a flowchart schematically illustrating a method of processing audio according to embodiments of the disclosure;

[0046] FIG. 5 is a flowchart schematically illustrating an example of an implementation of a step in the method of FIG. 4 according to embodiments of the disclosure;

[0047] FIG. 6 is a diagram showing an example of a transform matrix based on a DC input training signal according to embodiments of the disclosure;

[0048] FIG. 7 is a flowchart schematically illustrating another example of an implementation of a step in the method of FIG. 4 according to embodiments of the disclosure;

[0049] FIG. 8 is a diagram showing an example of a transform matrix based on an interpolation prototype filter according to embodiments of the disclosure;

[0050] FIG. 9 is a diagram showing an example of a time-domain gain function to be modelled; and

[0051] FIG. 10 is a block diagram of an example of an apparatus for performing methods according to embodiments of the disclosure.

DETAILED DESCRIPTION

[0052] The present disclosure describes methods for optimally (e.g., in a least squares error sense) calculating MFB (e.g., QMF) domain broad band gains to model any desired time-domain gain.

[0053] Here, MFB refers to any real- or (preferably) complex-valued, sub-sampled, M-channel (or frequency bands) modulated filter bank. Using this filter bank as an analysis filter bank, a time-domain input signal is transformed to the MFB domain (e.g., QMF domain) via MFB analysis (e.g., QMF analysis) to yield an MFB domain signal (e.g., QMF domain signal). After optional signal processing in the MFB domain, the (potentially processed) MFB domain signal is transformed back to the time domain via MFB synthesis, using an appropriate synthesis filter bank (e.g., inverse filter bank).

[0054] One example of a MFB is a QMF filter bank. The prototype filters of the analysis and the synthesis filter bank may be different and may be symmetric or asymmetric. The sub-sampling factor (or stride) S typically is equal to M (critically sampled with respect to the number of complex data) or smaller than M (over-sampled). Typical values for M for perceptual audio coding and processing at 48 kHz sample rate are 60, 64, or 77, for example. The overall analysis-synthesis system delay D depends on the nature of the prototype filters. With sub-sampling (S>1), the processing delay D.sub.P reduces to D.sub.P=DS+1. It is assumed that the MFB (e.g., QMF filter bank) has near perfect reconstruction property and that any aliasing errors due to filter bank domain processing are sufficiently well suppressed.

[0055] FIG. 1 schematically illustrates an example of a basic operation of applying a gain (e.g., cross-fading gain) in the time domain. This operation may be performed for example at the encoder-side to cross-fade between different sets of parameters (e.g., transform parameters). n is understood to denote a sample number or any other suitable time index. An input signal x(n), 10, to which a time-domain gain g(n), 20, is to be applied is input to multiplication block 160, at which the time-domain gain g(n) is applied, for example in a sample-by-sample manner. The multiplication block 160 outputs an output signal y(n), 170, which is a faded (and delayed) version of the input signal x(n), faded by the time-domain gain g(n).

[0056] FIG. 2 schematically illustrates an example of a corresponding operation that may be performed, for example, at the decoder-side. The input signal x(n), 10, is input to an MFB analysis block 40 (filter bank analysis, FB-A) for conversion to the MFB domain, yielding MFB domain signal u(c, k), 45. Here, c0, . . . , M1 indicates a frequency band or frequency channel, and k indicates an MFB time slot (e.g., QMF time slot). Optionally, parametric (de-) coding or other MFB-domain processing may be applied to the MFB-domain signal u(c, k). Subsequently, the MFB-domain signal u(c, k) is input to an MFB synthesis block 50 (filter bank synthesis, FB-S) for conversion back to the time domain, yielding time-domain signal x.sub.2(n), 55. For application of a delayed time domain gain g(nD.sub.P), 25, which is appropriately delayed by a processing delay D.sub.P of MFB analysis and synthesis, the time-domain signal x.sub.2(n) is input to multiplication block 260, at which the (delayed) time-domain gain g(nD.sub.P) is applied, for example in a sample-by-sample manner. The multiplication block 260 outputs an output signal y.sub.2(n), 270, which is a faded (and delayed) version of the input signal x(n), faded by the time-domain gain g(n).

[0057] As noted above, application of the (delayed) time domain gain after MFB analysis, potential MFB domain processing, and MFB synthesis typically leads to an additional delay in the audio processing chain (e.g., decoding chain). That is, if additional processing in the MFB domain on the cross-faded audio signals is desired, this would require another MFB analysis-synthesis stage following the time-domain cross-fade, adding a further processing delay D.sub.P of MFB analysis and synthesis to the overall delay. Moreover, when cross-fading two audio signals, the architecture of FIG. 2 would require running two instances of the MFB synthesis block 50, followed by cross-fading in the time domain, which significantly adds to computational overhead. To address these issues, the present disclosure proposes application of the gain in the MFB domain. This is illustrated in FIG. 3, which shows an example of the proposed operation with application of broad band gains G per time slot k in the MFB domain (filter bank domain in general; e.g., QMF domain).

[0058] The operation shown in FIG. 3 may be performed for example at the decoder-side. In the decoder, the transmitted/received audio signals may be decoded and undergo MFB analysis (e.g., QMF analysis). For signal reconstruction, transmitted parameters may be applied in the MFB domain and processed MFB domain signals are combined. MFB broad band slot gains may be applied to the parameters to cross-fade from previous to current frame parameters. For example, in a certain application there may be 16 time slots per frame and 16 broad band gains; however, also a smaller number, e.g., only 3 of those gains, may be optimally calculated as described below, to cover the desired cross-fade time range.

[0059] In FIG. 3, the input signal x(n), 10 is input to an MFB analysis block 40 (filter bank analysis, FB-A) for conversion to the MFB domain, yielding an MFB domain signal u(c, k), 45. Then, MFB-domain broad band gains G(k), 30, are applied in a per-time-slot manner. Optionally, parametric (de-) coding or other MFB-domain processing may be applied to the MFB-domain signal u(c, k), 45, before or after gain application. Subsequently, the gain-applied MFB-domain signal is input to an MFB synthesis block 50 (filter bank synthesis, FB-S) for conversion back to the time domain, yielding time-domain signal y.sub.3(n), 370, which is a faded (and delayed) version of the input signal x(n), faded by application of the MFB-domain broad band gains G(k) 30. Here, it is intended that application of the MFB-domain broad band gains G(k) 30 mimics (or emulates, implements) application of a predefined time-domain (target) gain g(n).

[0060] The present disclosure proposes two methods for calculation of optimal broad band gains G(k) as used in the example processing chain of FIG. 3.

[0061] As general notation, K is taken to represent the number of relevant filter bank analysis time slots. Then, a column vector G holding the unknown (i.e., to-be-calculated) broad band gains G(k) can be defined as

[00001] $\begin{matrix} G = {[G (0) G (1) .Math. G (K - 1)]}^{T}, & (1) \end{matrix}$

where .sup.T indicates the transpose.

[0062] The general least squares solution to the problem of finding the broad band gains can be given as

[00002] $\begin{matrix} G = {(T^{T} T)}^{- 1} (T^{T} t), & (2) \end{matrix}$

where T is a transform matrix with K columns and t is a target column vector dependent on the target gain function (time-domain target gain), as will be explained below in more detail for each of the proposed methods.

[0063] Performance may be measured as an energy of an error e(n)=y.sub.3(n)y.sub.2(n). This definition of the error deliberately excludes filter bank reconstruction errors and focuses on the modelling of the gain function. Referring to FIG. 2, assuming near perfect reconstruction property of the MFB under investigation, the following relationship holds

[00003] $\begin{matrix} x_{2} (n) x (n - D_{P}) . & (3) \end{matrix}$

[0064] In accordance with the above, an example of a method 400 of processing audio according to embodiments of the disclosure is shown in the flowchart of FIG. 4. Method 400 comprises steps S410 and S420, of which step S420 is optional. Further, method 400 makes reference to an MFB domain, for which one example is the QMF domain.

[0065] At step S410, modulated filter bank (MFB) domain broad band gains are determined for fading an audio signal in accordance with a time domain target gain. Here, application of the broad band gains in the MFB domain should emulate (i.e., mimic, or implement) application of the target gain in the time domain. In other words, the broad band gains may be determined to minimize the error e(n) defined above, for example in a least-squares sense. The (time-domain) target gain may relate to a target gain function that is a function of time (or sample number, or any other suitable time index).

[0066] Determination of the broad band gains may include computing the broad band gains using the target gain, an MFB analysis prototype filter, and an MFB synthesis prototype filter. A respective broad band gain G(k) may be computed for each of a plurality of MFB analysis time slots k. Detailed examples will be described below.

[0067] Further, determination (e.g., computation, calculation) of the broad band gains may include optimizing the broad band gains by computing a least squares solution. In some embodiments, the determined broad band gains may be limited (e.g., mapped or clipped) to a predetermined target range before they are applied to any audio signals. For example, the determined broad band gains may be limited to the range from 0 (0.0) to 1 (1.0) inclusive.

[0068] At step S420, transformed signals in the MFB domain are decoded, including fading an audio signal relating to a current parameter set and/or fading an audio signal relating to a previous parameter set, using the broad band gains per MFB analysis time slot. The fading may correspond to, for example, cross-fading audio signals. Specifically, the fading may correspond to cross-fading previous and current parameters to achieve cross-fading of signals related to the previous and current parameters.

[0069] The decoding at this step may involve applying the determined broad band gains in the MFB domain. Additionally or alternatively, the decoding may involve generating a time-domain broad band signal using the determined broad band gains.

[0070] Next, example methods for determining the broad band gains will be described in more detail.

Method 1: Training Signals

[0071] According to a first example method, optimal broad band gains may be computed for a training signal and applied afterwards in general.

[0072] The MFB analysis signal (e.g., QMF analysis signal) u at time slot k and frequency band c corresponding to an input training signal x(n) can be computed as

[00004] $\begin{matrix} u (c, k) = {.Math.}_{v = 0}^{L - 1} x (kS - v) p_{A} (v) m_{c} (v), & (4) \end{matrix}$

with the real-valued analysis prototype filter p.sub.A of length L and the modulator m.sub.c for band c given by, for example

[00005] $\begin{matrix} m_{c} (v) = \exp (i \frac{}{M} (c + 0.5) (v - D / 2)) . & (5) \end{matrix}$

[0073] The MFB synthesis signal (e.g., QMF synthesis signal) w.sub.k related to the analysis signal u at time slot k can be computed as

[00006] $\begin{matrix} w_{k} (n) = R e {{.Math.}_{c = 0}^{M - 1} u (c, k) p_{S} (n - k S) m_{c} (n - k S)}, & (6) \end{matrix}$

with the synthesis prototype filter p.sub.S of finite length.

[0074] The final synthesis signal x.sub.2 (as shown for example in FIG. 2) without filter bank processing can be computed as

[00007] $\begin{matrix} x_{2} (n) = {.Math.}_{k = -}^{} w_{k} (n) . & (7) \end{matrix}$

The process of Eq. (7) may be referred to as overlap add, for example.

[0075] The final output signal y.sub.3 (as shown for example in FIG. 3) with broad band gains G(k) applied can be computed as

[00008] $\begin{matrix} y_{3} (n) = {.Math.}_{k = -}^{} G (k) w_{k} (n) . & (8) \end{matrix}$

[0076] With the above definitions, determination of the broad band gains at step S410 of method 400 may proceed via example method 500 shown in the flowchart of FIG. 5. Method 500 comprises steps S510 through S530.

[0077] At step S510, for each of a plurality of time slots (MFB analysis time slots) k and for each of a plurality of frequency bands c, a respective MFB analysis signal u(c, k) is determined based on an input training signal x(n) and the MFB analysis prototype filter p.sub.A.

[0078] At step S520, for each of the plurality of MFB analysis time slots k, a respective MFB synthesis signal w.sub.k is determined based on the MFB analysis signals u(c, k) in the respective MFB analysis time slot k and the MFB synthesis prototype filter p.sub.S.

[0079] Then, at step S530, the broad band gains G(k) across MFB analysis time slots k are computed based on the MFB synthesis signals w.sub.k and the target gain g(n).

[0080] An example of details of the computation at step S530 that involves optimizing the broad band gains by computing a least squares solution is described next.

[0081] With K the number of relevant analysis time slots, a transform matrix T.sub.1 can be defined as

[00009] $\begin{matrix} T_{1} = [w_{0} (n) w_{1} (n) .Math. w_{K - 1} (n)], & (9) \end{matrix}$

where n=0, . . . , (K1)S+L1, and a column vector of broad band gains G can be defined as

[00010] $\begin{matrix} G = {[G_{0} G_{1} .Math. G_{K - 1}]}^{T} . & (10) \end{matrix}$

Finally, a target column vector t.sub.1 can be defined as

[00011] $\begin{matrix} t_{1} = y_{2} (n) = x_{2} (n) g (n - D_{P}), & (11) \end{matrix}$

where again n=0, . . . , (K1)S+L1.

[0082] Then the following equation can be solved in a least squares sense to determine the broad band gains,

[00012] $\begin{matrix} T_{1} G = t_{1} . & (12) \end{matrix}$

This equation can be solved for example via

[00013] $\begin{matrix} G = {(T_{1}^{T} T_{1})}^{- 1} (T_{1}^{T} t_{1}) . & (13) \end{matrix}$

[0083] In line with the above example, the least squares solution computed at step S530 of method 500 may minimize an error between samples of a first audio signal (e.g., audio signal y.sub.2(n) defined in Eq. (11) above) and samples of a second audio signal (e.g., audio signal y.sub.3(n) defined in Eq. (8) above). The first audio signal may be obtainable by MFB analysis of the training signal x(n) followed by MFB synthesis, overlap add, and application of the (delayed) target gain g(n). Alternatively, the first audio signal may be obtainable by application of the target gain and delaying by a processing delay of MFB analysis and MFB synthesis. The second audio signal may be obtainable by applying, in each MFB analysis time slot k, a respective broad band gain G(k) to a respective MFB synthesis signal w.sub.k, and by summing contributions from all MFB analysis time slots k. It is understood that the first and second audio signals may be time-domain audio signals.

[0084] Further in line with the above example, the least squares solution (e.g., the solution defined in Eq. (13) above) may be a solution to an objective function (e.g., the objective function defined in Eq. (12) above) based on a transform matrix T.sub.1 that depends on the plurality of MFB synthesis signals w.sub.k and a target vector t.sub.1 that depends on the target gain g(n).

[0085] In a specific example, the transform matrix T.sub.1 may be given by T.sub.1=[w.sub.0(n), w.sub.1(n), . . . , W.sub.K1(n)] as described above, where K is the number of MFB analysis time slots and n indicates a sample number. Further, the target vector t.sub.1 may be given by t.sub.1=x.sub.2(n) g(nD.sub.P), where x.sub.2(n) is a time-domain signal obtainable by MFB analysis of the training signal x(n) followed by MFB synthesis and overlap add, and D.sub.P is a delay (e.g., processing delay of MFB analysis followed by MFB synthesis and overlap add). The least squares solution may solve the equation T.sub.1G=t.sub.1, where G is a broad band gain vector given by G=[G.sub.0, G.sub.1, . . . , G.sub.K1].sup.T, with .sup.T indicating the transpose. Specifically, the least squares solution for the broad band gain vector G may be given by G=(T.sub.1.sup.TT.sub.1).sup.1(T.sub.1.sup.Tt.sub.1), with .sup.1 indicating the inverse.

[0086] As a variant of the above, the first audio signal could be obtained by applying the target gain and then delaying by the MFB delay. Then, the optimized broad band gains would also aim to minimize any MFB reconstruction error in addition to the actual gain objective. In this case, the target vector t.sub.1 in Eq. (11) would be replaced by

[00014] $\begin{matrix} t_{1} = [x .Math. g] (n - D_{P}) . & (14) \end{matrix}$

[0087] It can be shown that a white noise random input signal x(n) yields very good modelling results. Still, other input training signals may be used as well, such as a constant signal (e.g., DC signal). The latter has the advantage that the synthesis signals w.sub.k are shifted versions of one another. This is shown in the diagram of FIG. 6, which shows an example transform matrix T.sub.1 for constant input, S=60, and 3 time slots, and in which the horizontal axis indicates time slot indices and the vertical axis indicates time samples. However, the error with constant input may be higher in general than with random input.

[0088] In line with these findings, the input training signal used above may be a random signal (e.g., white noise) or a DC signal, for example.

[0089] Performance of the proposed method can further be improved by repeating the method few times (e.g., typically 2 or 3 times) and averaging the obtained optimal broad band gains.

[0090] Thus, computing the broad band gains via the steps of method 500 may be performed iteratively, with each iteration after the first iteration being computed with a respective modified training signal or a respective different training signal. Computing the broad band gains in each iteration after the first iteration may be further based on an average (e.g., weighted average, arithmetic mean, etc.) of results from at least one previous iteration. For example, a final result for the broad band gains may be determined by averaging over results of all iterations.

Method 2: Interpolation Prototypes

[0091] According to a second example method, optimal broad band gains may be computed using an MFB interpolation prototype filter (e.g., QMF interpolation prototype filter) that is based on the MFB analysis prototype filter (e.g., QMF analysis prototype filter) and the MFB synthesis prototype filter (e.g., QMF synthesis prototype filter).

[0092] Accordingly, determination of the broad band gains at step S410 of method 400 may proceed via example method 700 shown in the flowchart of FIG. 7. Method 700 comprises steps S710 and S720.

[0093] At step S710, an MFB interpolation prototype filter is determined based on the MFB analysis prototype filter and the MFB synthesis prototype filter.

[0094] Then, at step S720, the broad band gains across MFB analysis time slots are computed based on the MFB interpolation prototype filter and the target gain.

[0095] This method has the advantage that no training data (training signal) is needed, and the result only depends on the analysis and synthesis prototype filters. However, some aliasing products in the filter bank processing may be ignored by this method.

[0096] An example of details of the computations at steps S710 and S720 that involve optimizing the broad band gains by computing a least squares solution is described next. In particular, it can be shown that the output x.sub.2(n) of the analysis-synthesis filter bank operation (as shown for example in FIG. 2) can be approximated by the following equation,

[00015] $\begin{matrix} x_{2} (n) x (n - D_{P}) {.Math.}_{k = -}^{} p_{i} (n - kS), & (15) \end{matrix}$

and the output signal x.sub.3(n) (as shown for example in FIG. 3) with broad band gains G(k) applied can be approximated as

[00016] $\begin{matrix} x_{3} (n) x (n - D_{P}) {.Math.}_{k = -}^{} G (k) p_{i} (n - kS), & (16) \end{matrix}$

with the interpolation prototype (interpolation prototype filter) p.sub.i of effective length D+1 defined as

[00017] $\begin{matrix} p_{i} (n) = p_{S} (n) p_{A} (D - n), & (17) \end{matrix}$

where p.sub.A is the MFB analysis prototype filter and p.sub.S is the MFB synthesis prototype filter.

[0097] That is, the MFB interpolation prototype filter p.sub.i may be determined (e.g., calculated) (e.g., at step S710) as a product of one of the MFB analysis prototype filter and the MFB synthesis prototype filter and a mirrored (e.g., time-mirrored) and shifted version of the other one of the MFB analysis prototype filter and the MFB synthesis prototype filter. The mirrored and shifted version may be shifted in accordance with an effective length of the MFB interpolation prototype filter. For example, the MFB interpolation prototype filter p.sub.i(n) may be calculated as a product of the MFB synthesis filter p.sub.S(n) and a mirrored and shifted version, P.sub.A(Dn), of the MFB analysis filter p.sub.A(n), as p.sub.i(n)=p.sub.S(n)p.sub.A(Dn), where D+1 is the effective length of the MFB interpolation prototype filter p.sub.i.

[0098] With K be the number of relevant analysis time slots, a transform matrix T.sub.2 can be defined as

[00018] $\begin{matrix} T_{2} = [p_{i} (n) p_{i} (n - S) .Math. p_{i} (n - Ks)], & (18) \end{matrix}$

where n=0, . . . , (K1)S+D, and a column vector of broad band gains G can be defined as

[00019] $\begin{matrix} G = {[G_{0} G_{1} .Math. G_{K - 1}]}^{T} . & (19) \end{matrix}$

Finally, a target column vector t.sub.2 can be defined as

[00020] $\begin{matrix} t_{2} = g (n - D_{P}) {.Math.}_{k = 0}^{K - 1} p_{i} (n - kS), & (20) \end{matrix}$

for n=0, . . . , (K1)S+L1.

[0099] Then (e.g., at step S720) the following equation can be solved in a least squares sense,

[00021] $\begin{matrix} T_{2} G = t_{2} . & (21) \end{matrix}$

This equation can be solved for example via

[00022] $\begin{matrix} G = {(T_{2}^{T} T_{2})}^{- 1} (T_{2}^{T} t_{2}) . & (22) \end{matrix}$

[0100] In line with the above example, the least squares solution computed at step S720 of method 700 may be a solution to an objective function based on a transform matrix T.sub.2 and a target vector t.sub.2 that depends on the target gain g(n). Further, the least squares solution may be a solution to an objective function based on a transform matrix T.sub.2 that depends on the MFB interpolation prototype filter p.sub.i(n) and a target vector t.sub.2 that depends on the target gain g(n). Specifically, the transform matrix T.sub.2 may be a matrix of shifted versions of the MFB interpolation prototype filter p.sub.i, each associated with a particular MFB analysis time slot k.

[0101] For example, in line with the above the transform matrix T.sub.2 may be given by T.sub.2=[p.sub.i(n), p.sub.i(nS), . . . , p.sub.i(n(K1)S)], where p.sub.i is the MFB interpolation prototype filter, K is the number of MFB analysis time slots, n indicates a sample number, and S is a slot length of the MFB analysis time slots. The target vector t.sub.2 may be given by t.sub.2=g(nD).sub.k=0.sup.K1p.sub.i(nKS), where g is the target gain. Then, the least squares solution may solve the equation T.sub.2G=t.sub.2, where G is a broad band gain vector given by G=[G.sub.0, G.sub.1, . . . , G.sub.K1].sup.T, with .sup.T indicating the transpose. The least squares solution for the broad band gain vector G may for example be given by G=(T.sub.2.sup.TT.sub.2).sup.1(T.sub.2.sup.Tt.sub.2), with .sup.1 indicating the inverse.

[0102] FIG. 8 is a diagram showing an example transform matrix T.sub.2 based on an interpolation prototype for S=60 and 3 time slots. The horizontal axis indicates time slot indices and the vertical axis indicates time samples.

[0103] Using any of the aforementioned methods, the MFB domain broad band gains may be calculated for example at initialization time at the decoder. Alternatively, they may be calculated separately from the decoder, and pre-calculated broad band gains may be stored at the decoder, for example in the form of one or more look-up tables. In yet another implementation, the broad band gains may be calculated at the encoder-side and transmitted to the decoder-side, for example together with the encoded audio signals.

Determination of Relevant MFB Time Slots

[0104] One way to determine the relevant time slots for gain modelling may be to identify the non-constant gain function section of the target gain, encapsulated by time samples n.sub.0 and n.sub.1. Then, associated time slots k.sub.0 and k.sub.1 (with the overall number K of time slots given by K=k.sub.1k.sub.0+1) may be calculated as

[00023] $\begin{matrix} k_{0} = INT ((n_{0} - \frac{D}{2}) / S), & (23) \end{matrix}$ $\begin{matrix} k_{1} = INT ((n_{1} - \frac{D}{2}) / S + 0.5) . & (24) \end{matrix}$

[0105] The target gain function g(nD.sub.P) and the transform matrix involved in the optimization may be limited accordingly to n=k.sub.0S . . . (k.sub.11)S+L1 for the first example method and to n=K.sub.0S . . . (k.sub.1)S+D for the second example method.

[0106] In line with the above, the methods described above (e.g., method 400, method 500, or method 700) may further comprise determining a set of (relevant) MFB analysis time slots by identifying a non-constant gain function section of the target gain, encapsulated (e.g., delimited, bounded) by time samples (e.g., n.sub.0 and n.sub.1), and determining associated time slots based on the non-constant gain function section.

[0107] An example of a gain function corresponding to a time-domain target gain for modelling MFB-domain broad band gains (per MFB time slot) is shown in the diagram of FIG. 9, with the horizontal axis indicating time samples and the vertical axis indicating gain values, normalized to values from 0 to 1. A non-constant gain function section may be identified in this diagram as the rising portion between constant gain function sections.

Apparatus for Implementing Methods According to the Disclosure

[0108] Finally, the present disclosure likewise relates to an apparatus (e.g., computer-implemented apparatus) for performing methods and techniques described throughout the present disclosure. FIG. 10 shows an example of such apparatus 1000. In particular, apparatus 1000 comprises a processor 1010 and a memory 1020 coupled to the processor 1010. The memory 1020 may store instructions for the processor 1010. The processor 1010 may also receive, among others, suitable input data 1030 (e.g., audio input, time domain target gains, etc.), depending on use cases and/or implementations. The processor 1010 may be adapted to carry out the methods/techniques described throughout the present disclosure (e.g., method 400 of FIG. 4, method 500 of FIG. 5, and/or method 700 of FIG. 7) and to generate corresponding output data 1040 (e.g., MBF-domain broad band gains or cross-faded audio signals), depending on use cases and/or implementations.

[0109] The present disclosure likewise relates to corresponding computer programs, computer program products, and computer-readable storage media storing such computer programs or computer program products.

Technical Advantages

[0110] Techniques described herein can be applied to efficiently decode transformed signals in the QMF domain which may be required for subsequent processing. The input to the encoder may be two or more audio channels which are subject to an invertible transform where the transform varies over time frames. When switching from the previous to the current time frame transform parameters the transformed signals are smoothly cross-faded in the time domain prior to encoding.

[0111] At the decoder, the encoder process is inverted in the QMF domain by cross-fading the current and previous frame parameter sets (transformed to the QMF domain, potentially combined with other processing) using time varying broad band gains per QMF analysis time slot. This disclosure describes in particular how those QMF domain broad band gains can be calculated with advantage.

Interpretation

[0112] Aspects of the systems described herein may be implemented in an appropriate computer-based sound processing network environment (e.g., server or cloud environment) for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks that comprise any desired number of individual machines, including one or more routers (not shown) that serve to buffer and route the data transmitted among the computers. Such a network may be built on various different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.

[0113] One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.

[0114] Specifically, it should be understood that embodiments may include hardware, software, and electronic components or modules that, for purposes of discussion, may be illustrated and described as if the majority of the components were implemented solely in hardware. However, one of ordinary skill in the art, and based on a reading of this detailed description, would recognize that, in at least one embodiment, the electronic-based aspects may be implemented in software (e.g., stored on non-transitory computer-readable medium) executable by one or more electronic processors, such as a microprocessor and/or application specific integrated circuits (ASICs). As such, it should be noted that a plurality of hardware and software-based devices, as well as a plurality of different structural components, may be utilized to implement the embodiments. For example, the systems, encoders, decoders, or blocks described in the context of FIG. 1, FIG. 2, FIG. 3, and/or FIG. 19 above can include one or more electronic processors, one or more computer-readable medium modules, one or more input/output interfaces, and various connections (e.g., a system bus) connecting the various components.

[0115] While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting. The use of including, comprising, or having and variations thereof are meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless specified or limited otherwise, the terms mounted, connected, supported, and coupled and variations thereof are used broadly and encompass both direct and indirect mountings, connections, supports, and couplings.

Enumerated Example Embodiments

[0116] Various aspects and implementations of the present disclosure may also be appreciated from the following enumerated example embodiments (EEEs), which are not claims.

[0117] EEE1. A method of processing audio, the method comprising determining modulated filter bank, MFB, domain broad band gains for fading an audio signal in accordance with a time domain target gain, so that application of the broad band gains in the MFB domain emulates application of the target gain in the time domain, wherein determining the broad band gains includes computing the broad band gains using the target gain, an MFB analysis prototype filter, and an MFB synthesis prototype filter.

[0118] EEE2. The method according to EEE1, wherein a respective broad band gain is computed for each of a plurality of MFB analysis time slots.

[0119] EEE3. The method according to any one of the preceding EEEs, wherein computing the broad band gains includes optimizing the broad band gains by computing a least squares solution.

[0120] EEE4. The method according to EEE1, wherein determining the broad band gains includes: [0121] determining, for each of a plurality of MFB analysis time slots and for each of a plurality of frequency bands, a respective MFB analysis signal, based on an input training signal and the MFB analysis prototype filter; [0122] determining, for each of the plurality of MFB analysis time slots, a respective MFB synthesis signal, based on the MFB analysis signals in the respective MFB analysis time slot and the MFB synthesis prototype filter; and [0123] computing the broad band gains across MFB analysis time slots based on the MFB synthesis signals and the target gain.

[0124] EEE5. The method according to EEE4, wherein computing the broad band gains includes optimizing the broad band gains by computing a least squares solution.

[0125] EEE6. The method according to EEE5, wherein the least squares solution minimizes an error between samples of a first audio signal and samples of a second audio signal, the first audio signal obtainable by MFB analysis of the training signal followed by MFB synthesis, overlap add, and application of the target gain, or by application of the target gain and delaying by a processing delay of MFB analysis and MFB synthesis, and the second audio signal obtainable by applying, in each MFB analysis time slot, a respective broad band gain to a respective MFB synthesis signal, and by summing contributions from all MFB analysis time slots.

[0126] EEE7. The method according to EEE5 or EEE6, wherein the least squares solution is a solution to an objective function based on a transform matrix T.sub.1 that depends on the plurality of MFB synthesis signals and a target vector t.sub.1 that depends on the target gain.

[0127] EEE8. The method according to EEE7, wherein the transform matrix T.sub.1 is given by T.sub.1=[w.sub.0(n), w.sub.1(n), . . . , w.sub.K1(n)], where K is the number of MFB analysis time slots and n indicates a sample number, and the target vector t.sub.1 is given by t.sub.1=x.sub.2(n) g(nD.sub.P), where x.sub.2(n) is a time-domain signal obtainable by MFB analysis of the training signal followed by MFB synthesis and overlap add, and D.sub.P is a delay; and [0128] wherein the least squares solution solves the equation T.sub.1G=t.sub.1, where G is a broad band gain vector given by G=[G.sub.0, G.sub.1, . . . , G.sub.K1].sup.T, with .sup.T indicating the transpose.

[0129] EEE9. The method according to EEE8, wherein the least squares solution for the broad band gain vector G is given by G=(T.sub.1.sup.TT.sub.1).sup.1(T.sub.1.sup.Tt.sub.1), with .sup.1 indicating the inverse.

[0130] EEE10. The method according to any one of EEE4 to EEE9, wherein the training signal is a random signal or a DC signal.

[0131] EEE11. The method according to any one of EEE4 to EEE10, wherein computing the broad band gains is performed iteratively, each iteration after the first iteration being computed with a respective modified training signal or a respective different training signal.

[0132] EEE12. The method according to EEE1, wherein determining the broad band gains includes: [0133] determining an MFB interpolation prototype filter based on the MFB analysis prototype filter and the MFB synthesis prototype filter; and [0134] computing the broad band gains across MFB analysis time slots based on the MFB interpolation prototype filter and the target gain.

[0135] EEE13. The method according to EEE12, wherein the MFB interpolation prototype filter is determined as a product of one of the MFB analysis prototype filter and the MFB synthesis prototype filter and a mirrored (e.g., time-mirrored) and shifted version of the other one of the MFB analysis prototype filter and the MFB synthesis prototype filter.

[0136] EEE14. The method according to EEE12 or EEE13, wherein computing the broad band gains includes optimizing the broad band gains by computing a least squares solution.

[0137] EEE15. The method according to EEE14, wherein the least squares solution is a solution to an objective function based on a transform matrix T.sub.2 and a target vector t.sub.2 that depends on the target gain.

[0138] EEE16. The method according to EEE14, wherein the least squares solution is a solution to an objective function based on a transform matrix T.sub.2 that depends on the MFB interpolation prototype filter and a target vector t.sub.2 that depends on the target gain.

[0139] EEE17. The method according to EEE15 or EEE16, wherein the transform matrix T.sub.2 is a matrix of shifted versions of the MFB interpolation prototype filter, each associated with a particular MFB analysis time slot.

[0140] EEE18. The method according to any one of EEE15 to EEE17, wherein the transform matrix T.sub.2 is given by T.sub.2=[p.sub.i(n), p.sub.i(nS), . . . , p.sub.i(n(K1)S)], where p.sub.i is the MFB interpolation prototype filter, K is the number of MFB analysis time slots, n indicates a sample number, and S is a slot length of the MFB analysis time slots, and the target vector t.sub.2 is given by t.sub.2=g(nD).sub.k=0.sup.K1p.sub.i(nKS), where g is the target gain; and wherein the least squares solution solves the equation T.sub.2G=t.sub.2, where G is a broad band gain vector given by G=[G.sub.0, G.sub.1, . . . , G.sub.K1].sup.T, with .sup.T indicating the transpose.

[0141] EEE19. The method according to EEE18, wherein the least squares solution for the broad band gain vector G is given by G=(T.sub.2.sup.TT.sub.2).sup.1(T.sub.2.sup.Tt.sub.2), with .sup.1 indicating the inverse.

[0142] EEE20. The method according to EEE18 or EEE19, wherein the MFB interpolation prototype filter p.sub.i is given by p.sub.i(n)=p.sub.S(n)P.sub.A(Dn), where p.sub.A is the MFB analysis prototype filter, p.sub.S is the MFB synthesis prototype filter, and D+1 is an effective length of the MFB interpolation prototype filter p.sub.i.

[0143] EEE21. The method according to any one of the preceding EEEs, further comprising determining a set of MFB analysis time slots by identifying a non-constant gain function section of the target gain, encapsulated by time samples, and determining associated time slots based on the non-constant gain function section.

[0144] EEE22. The method according to any one of the preceding EEEs, wherein the MFB domain is a quadrature mirror filter, QMF, domain.

[0145] EEE23. The method according to any one of the preceding EEEs, comprising applying the determined broad band gains in the MFB domain.

[0146] EEE24. The method according to any one of the preceding EEEs, comprising generating a time-domain broad band signal using the determined broad band gains.

[0147] EEE25. The method according to any one of the preceding EEEs, comprising limiting the determined broad band gains to a range from 0 to 1 inclusive.

[0148] EEE26. The method according to any one of the preceding EEEs, further comprising: [0149] decoding transformed signals in the MFB domain, including fading an audio signal relating to a current parameter set and/or fading an audio signal relating to a previous parameter set, using the broad band gains per MFB analysis time slot.

[0150] EEE27. An apparatus, comprising a processor and a memory coupled to the processor, and storing instructions for the processor, wherein the processor is adapted to carry out the method according to any one of EEE1 to EEE26.

[0151] EEE28. A program comprising instructions that, when executed by a processor, cause the processor to carry out the method according to any one of EEE1 to EEE26.

[0152] EEE29. A computer-readable storage medium storing the program according to EEE28.

[0153] EEE30. A method of processing audio, comprising: [0154] computing broad band gains for a training signal, including computing a modulated filter bank (MFB) analysis signal and a MFB synthesis signal related to the MFB analysis signal; and [0155] decoding transformed signals in a MFB domain, including cross-fading a current and previous parameter sets using the broad band gains per MFB analysis time slot.

[0156] EEE31. The method of EEE30, wherein the training signal is a random signal or a DC signal.

[0157] EEE32. The method of EEE30, comprising determining the MFB analysis time slot by identifying non-constant gain function section encapsulated by time samples, and determining associated time slots based on the non-constant gain function.

[0158] EEE33. The method of EEE30, wherein computing the broad band gains is performed iteratively, each iteration after the first iteration being computed with a respective modified training signal and an average of results from at least one previous iteration.

[0159] EEE34. A method of processing audio, comprising: [0160] computing broad band gains, including: [0161] computing an interpolation prototype filter based on one or more filterbank prototype filters; and [0162] approximating the broad band gains of a modulated filter bank (MFB) analysis signal and a MFB synthesis signal across time slots based on the interpolation prototype filter; and decoding transformed signals in a MFB domain, including cross-fading a current and previous parameter sets using the broad band gains per MFB analysis time slot.

[0163] EEE35. The method of any one of EEE30 to EEE34, comprising applying the broad band gains to a filter bank domain.

[0164] EEE36. The method of any one of EEE30 to EEE34, comprising generating the target time-domain broad band signal using the broad band gains.

[0165] EEE37. The method of any one of EEE30 to EEE36, wherein computing the broad band gains includes optimizing gain functions by computing a least squared solution.

[0166] EEE38. The method of EEE37, wherein the least squared solution is a solution to an objective function based on a transform matrix T and a target vector t that depends on a target gain function.

[0167] EEE39. The method of EEE38, wherein the matrix T is a matrix of shifted versions of the interpolation prototype or filterbank synthesis data associated with a particular analysis time slot.

[0168] EEE40. A system including one or more processors configured to perform operations of any one of EEE30 to EEE39.

[0169] EEE41. A computer program product configured to cause one or more processors to perform operations of any one of EEE30 to EEE39.

TIME-DOMAIN GAIN MODELING IN THE QMF DOMAIN

Assignee

Inventors

Cpc classification

Classification Explorer

G10L19/0204

PHYSICS

Classification Explorer

G10L21/0324

PHYSICS

Classification Explorer

G10L25/18

PHYSICS

Classification Explorer

G10L21/043

PHYSICS

Classification Explorer

G10L19/005

PHYSICS

Classification Explorer

G06F17/17

PHYSICS

International classification

Classification Explorer

G10L21/043

PHYSICS

Classification Explorer

G10L19/005

PHYSICS

Classification Explorer

G06F17/17

PHYSICS

Abstract

Claims

Description