SUBBAND BLOCK BASED HARMONIC TRANSPOSITION

Abstract

The present document relates to audio source coding systems which make use of a harmonic transposition method for high frequency reconstruction (HFR), as well as to digital effect processors, e.g. exciters, where generation of harmonic distortion add brightness to the processed signal, and to time stretchers where a signal duration is prolonged with maintained spectral content. A system and method configured to generate a time stretched and/or frequency transposed signal from an input signal is described. The system comprises an analysis filterbank configured to provide an analysis subband signal from the input signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples, each having a phase and a magnitude. Furthermore, the system comprises a subband processing unit configured to determine a synthesis subband signal from the analysis subband signal using a subband transposition factor Q and a subband stretch factor S. The subband processing unit performs a block based nonlinear processing wherein the magnitude of samples of the synthesis subband signal are determined from the magnitude of corresponding samples of the analysis subband signal and a predetermined sample of the analysis subband signal. In addition, the system comprises a synthesis filterbank configured to generate the time stretched and/or frequency transposed signal from the synthesis subband signal.

Claims

1. An audio processing device including a subband processing unit configured to determine a synthesis subband signal from an analysis subband signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples at different times, each having a phase and a magnitude; wherein the analysis subband signal is associated with a frequency band of an input audio signal; wherein the subband processing unit comprises a block extractor configured to repeatedly derive a frame of L input samples from the plurality of complex valued analysis samples; the frame length L being greater than one; and apply a block hop size of p samples to the plurality of complex valued analysis samples, prior to deriving a next frame of L input samples; thereby generating a suite of frames of L input samples; a nonlinear frame processing unit configured to determine a frame of processed samples from a frame of input samples, by determining for each processed sample of the frame: the phase of the processed sample based on a sum of the phase of the corresponding input sample and the phase of a predetermined input sample scaled by an integer phase modification factor; and the magnitude of the processed sample based on the magnitude of the corresponding input sample and the magnitude of the predetermined input sample; and an overlap and add unit configured to determine the synthesis subband signal by overlapping and adding the samples of a suite of frames of processed samples; wherein the synthesis subband signal is associated with a frequency band of a signal which is time stretched and/or frequency transposed with respect to the input audio signal.

2. The audio processing device of claim 1, wherein the block extractor is configured to downsample the plurality of complex valued analysis samples by a subband transposition factor Q.

3. The audio processing device of claim 1, wherein the block extractor is configured to interpolate two or more complex valued analysis samples to derive an input sample.

4. The audio processing device of claim 1, wherein the nonlinear frame processing unit is configured to determine the magnitude of the processed sample as a mean value of the magnitude of the corresponding input sample and the magnitude of the predetermined input sample.

5. The audio processing device of claim 4, wherein the nonlinear frame processing unit is configured to determine the magnitude of the processed sample as the geometric mean value of the magnitude of the corresponding input sample and the magnitude of the predetermined input sample.

6. The audio processing device of claim 5, wherein the geometric mean value is determined as the magnitude of the corresponding input sample raised to the power of (1−ρ), multiplied by the magnitude of the predetermined input sample raised to the power of ρ, wherein the geometrical magnitude weighting parameter ρ∈(0,1].

7. The audio processing device of claim 6, wherein the geometrical magnitude weighting parameter ρ is a function of a subband transposition factor Q and a subband stretch factor S.

8. The audio processing device of claim 7, wherein the geometrical magnitude weighting parameter $ρ = 1 - \frac{1}{Q S} .$

9. The audio processing device of claim 1, wherein the nonlinear frame processing unit (202) is configured to determine the phase of the processed sample by offsetting the phase of the corresponding input sample by a phase offset value which is based on the predetermined input sample from the frame of input samples, a transposition factor Q and a subband stretch factor S.

10. The audio processing device of claim 9, wherein the phase offset value is based on the predetermined input sample multiplied by (QS−1)

11. The audio processing device of claim 10, wherein the phase offset value is given by the predetermined input sample multiplied by (QS−1) plus a phase correction parameter θ.

12. The audio processing device of claim 11, wherein the phase correction parameter θ is determined experimentally for a plurality of input signals having particular acoustic properties.

13. The audio processing device of claim 1, wherein the predetermined input sample is the same for each processed sample of the frame.

14. The audio processing device of claim 1, wherein the predetermined input sample is the center sample of the frame of input samples.

15. The audio processing device of claim 1, wherein the overlap and add unit applies a hop size to succeeding frames of processed samples, the hop size being equal to the block hop size P multiplied by a subband stretch factor S.

16. The audio processing device of claim 1, wherein the subband processing unit further comprises a windowing unit upstream of the overlap and add unit and configured to apply a window function to the frame of processed samples.

17. The audio processing device of claim 1, wherein the subband processing unit is configured to determine a plurality of synthesis subband signals from a plurality of analysis subband signals; the plurality of analysis subband signals is associated with a plurality of frequency bands of the input audio signal; and the plurality of synthesis subband signals is associated with a plurality of frequency bands of the signal which is time stretched and/or frequency transposed with respect to the input audio signal.

18. A method, performed by an audio processing device, for generating a synthesis subband signal that is associated with a frequency band of a signal which is time stretched and/or frequency transposed with respect to an input audio signal, the method comprising: providing an analysis subband signal which is associated with a frequency band of the input audio signal; wherein the analysis subband signal comprises a plurality of complex valued analysis samples at different times, each having a phase and a magnitude; deriving a frame of L input samples from the plurality of complex valued analysis samples; the frame length L being greater than one; applying a block hop size of p samples to the plurality of complex valued analysis samples, prior to deriving a next frame of L input samples; thereby generating a suite of frames of input samples; determining a frame of processed samples from a frame of input samples, by determining for each processed sample of the frame: the phase of the processed sample based on a sum of the phase of the corresponding input sample and the phase of a predetermined input sample scaled by an integer phase modification factor; and the magnitude of the processed sample based on the magnitude of the corresponding input sample and the magnitude of the predetermined input sample; and determining the synthesis subband signal by overlapping and adding the samples of a suite of frames of processed samples, wherein one or more of providing an analysis subband signal, deriving a frame, applying a block hop size, determining a frame of processed sample, and determining the synthesis subband signal is implemented, at least in part, by one or more hardware devices.

19. A non-transitory storage medium comprising a software program adapted for execution on a processor and for performing the method steps of claim 18 when carried out on an audio processing device.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0060] The present invention will now be described by way of illustrative examples, not limiting the scope or spirit of the invention, with reference to the accompanying drawings, in which:

[0061] FIG. 1 illustrates the principle of an example subband block based harmonic transposition;

[0062] FIG. 2 illustrates the operation of an example nonlinear subband block processing with one subband input;

[0063] FIG. 3 illustrates the operation of an example nonlinear subband block processing with two subband inputs;

[0064] FIG. 4 illustrates an example scenario for the application of subband block based transposition using several orders of transposition in a HFR enhanced audio codec;

[0065] FIG. 5 illustrates an example scenario for the operation of a multiple order subband block based transposition applying a separate analysis filter bank per transposition order;

[0066] FIG. 6 illustrates an example scenario for the efficient operation of a multiple order subband block based transposition applying a single 64 band QMF analysis filter bank; and

[0067] FIG. 7 illustrates the transient response for a subband block based time stretch of a factor two of an example audio signal.

DESCRIPTION OF PREFERRED EMBODIMENTS

[0068] The below-described embodiments are merely illustrative for the principles of the present invention for improved subband block based harmonic transposition. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.

[0069] FIG. 1 illustrates the principle of an example subband block based transposition, time stretch, or a combination of transposition and time stretch. The input time domain signal is fed to an analysis filterbank 101 which provides a multitude or a plurality of complex valued subband signals. This plurality of subband signals is fed to the subband processing unit 102, whose operation can be influenced by the control data 104. Each output subband of the subband processing unit 102 can either be obtained from the processing of one or from two input subbands, or even from a superposition of the result of several such processed subbands. The multitude or plurality of complex valued output subbands is fed to the synthesis filterbank 103, which in turn outputs a modified time domain signal. The control data 104 is instrumental to improve the quality of the modified time domain signal for certain signal types. The control data 104 may be associated with the time domain signal. In particular, the control data 104 may be associated with or may depend on the type of time domain signal which is fed into the analysis filterbank 101. By way of example, the control data 104 may indicate if the time domain signal, or a momentary excerpt of the time domain signal, is a stationary signal or if the time domain signal is a transient signal.

[0070] FIG. 2 illustrates the operation of an example nonlinear subband block processing 102 with one subband input. Given the target values of physical time stretch and/or transposition, and the physical parameters of the analysis and synthesis filterbanks 101 and 103, one deduces subband time stretch and transposition parameters as well as a source subband index, which may also be referred to as an index of the analysis subband, for each target subband index, which may also be referred to as an index of a synthesis subband. The aim of the subband block processing is to implement the corresponding transposition, time stretch, or a combination of transposition and time stretch of the complex valued source subband signal in order to produce the target subband signal.

[0071] In the nonlinear subband block processing 102, the block extractor 201 samples a finite frame of samples from the complex valued input signal. The frame may be defined by an input pointer position and the subband transposition factor. This frame undergoes nonlinear processing in the nonlinear processing unit 202 and is subsequently windowed by a finite length window in 203. The window 203 may be e.g. a Gaussian window, a cosine window, a Hamming window, a Hann window, a rectangular window, a Bartlett window, a Blackman window, etc. The resulting samples are added to previously output samples in the overlap and add unit 204 where the output frame position may be defined by an output pointer position. The input pointer is incremented by a fixed amount, also referred to as a block hop size, and the output pointer is incremented by the subband stretch factor times the same amount, i.e. by the block hop size multiplied by the subband stretch factor. An iteration of this chain of operations will produce an output signal with a duration being the subband stretch factor times the input subband signal duration (up to the length of the synthesis window) and with complex frequencies being transposed by the subband transposition factor.

[0072] The control data 104 may have an impact to any of the processing blocks 201, 202, 203, 204 of the block based nonlinear processing 102. In particular, the control data 104 may control the length of the blocks extracted in the block extractor 201. In an embodiment, the block length is reduced when the control data 104 indicates that the time domain signal is a transient signal, whereas the block length is increased or maintained at the longer length when the control data 104 indicates that the time domain signal is a stationary signal. Alternatively or in addition, the control data 104 may impact the nonlinear processing unit 202, e.g. a parameter used within the nonlinear processing unit 202, and/or the windowing unit 203, e.g. the window used in the windowing unit 203.

[0073] FIG. 3 illustrates the operation of an example nonlinear subband block processing 102 with two subband inputs. Given the target values of physical time stretch and transposition, and the physical parameters of the analysis and synthesis filterbanks 101 and 103, one deduces subband time stretch and transposition parameters as well as two source subband indices for each target subband index. The aim of the subband block processing is to implement the according transposition, time stretch, or a combination of transposition and time stretch of the combination of the two complex valued source subband signals in order to produce the target subband signal. The block extractor 301-1 samples a finite frame of samples from the first complex valued source subband and the block extractor 301-2 samples a finite frame of samples from the second complex valued source subband. In an embodiment, one of the block extractors 301-1 and 301-2 may produce a single subband sample, i.e. one of the block extractors 301-1, 301-2 may apply a block length of one sample. The frames may be defined by a common input pointer position and the subband transposition factor. The two frames extracted in block extractors 301-1, 301-2, respectively, undergo nonlinear processing in unit 302. The nonlinear processing unit 302 typically generates a single output frame from the two input frames. Subsequently, the output frame is windowed by a finite length window in unit 203. The above process is repeated for a suite of frames which are generated from a suite of frames extracted from two subband signals using a block hop size. The suite of output frames is overlapped and added in an overlap and add unit 204. An iteration of this chain of operations will produce an output signal with duration being the subband stretch factor times the longest of the two input subband signals (up to the length of the synthesis window). In case that the two input subband signals carry the same frequencies, the output signal will have complex frequencies transposed by the subband transposition factor.

[0074] As outlined in the context of FIG. 2, the control data 104 may be used to modify the operation of the different blocks of the nonlinear processing 102, e.g. the operation of the block extractors 301-1, 301-2. Furthermore, it should be noted that the above operations are typically performed for all of the analysis subband signals provided by the analysis filterbank 101 and for all of the synthesis subband signals which are input into the synthesis filterbank 103.

[0075] In the following text, a description of the principles of subband block based time stretch and transposition will be outlined with reference to FIGS. 1-3, and by adding appropriate mathematical terminology.

[0076] The two main configuration parameters of the overall harmonic transposer and/or time stretcher are

[0077] S.sub.φ: the desired physical time stretch factor; and

[0078] Q.sub.φ: the desired physical transposition factor.

[0079] The filterbanks 101 and 103 can be of any complex exponential modulated type such as QMF or a windowed DFT or a wavelet transform. The analysis filterbank 101 and the synthesis filterbank 103 can be evenly or oddly stacked in the modulation and can be defined from a wide range of prototype filters and/or windows. Whereas all these second order choices affect the details in the subsequent design such as phase corrections and subband mapping management, the main system design parameters for the subband processing can typically be derived from the knowledge of the two quotients Δt.sub.S/Δt.sub.A and Δf.sub.S/Δf.sub.A of the following four filter bank parameters, all measured in physical units. In the above quotients, [0080] Δt.sub.A is the subband sample time step or time stride of the analysis filterbank 101 (e.g. measured in seconds [s]); [0081] Δf.sub.A is the subband frequency spacing of the analysis filterbank 101 (e.g. measured in Hertz [1/s]); [0082] Δt.sub.S is the subband sample time step or time stride of the synthesis filterbank 103 (e.g. measured in seconds [s]); and [0083] Δf.sub.S is the subband frequency spacing of the synthesis filterbank 103 (e.g. measured in Hertz [1/s]).

[0084] For the configuration of the subband processing unit 102, the following parameters should be computed: [0085] S: the subband stretch factor, i.e. the stretch factor which is applied within the subband processing unit 102 in order to achieve an overall physical time stretch of the time domain signal by S.sub.φ; [0086] Q: the subband transposition factor, i.e. the transposition factor which is applied within the subband processing unit 102 in order to achieve an overall physical frequency transposition of the time domain signal by the factor Q.sub.φ; and [0087] the correspondence between source and target subband indices, wherein n denotes an index of an analysis subband entering the subband processing unit 102, and in denotes an index of a corresponding synthesis subband at the output of the subband processing unit 102.

[0088] In order to determine the subband stretch factor S, it is observed that an input signal to the analysis filterbank 101 of physical duration D corresponds to a number D/Δt.sub.A of analysis subband samples at the input to the subband processing unit 102. These D/Δt.sub.A samples will be stretched to S.Math.D/Δt.sub.A samples by the subband processing unit 102 which applies the subband stretch factor S. At the output of the synthesis filterbank 103 these S.Math.D/Δt.sub.A samples result in an output signal having a physical duration of Δt.sub.S.Math.S.Math.D/Δt.sub.A. Since this latter duration should meet the specified value S.sub.φ.Math.D, i.e. since the duration of the time domain output signal should be time stretched compared to the time domain input signal by the physical time stretch factor S.sub.φ, the following design rule is obtained:

[00009] $\begin{matrix} S = \frac{Δ t_{A}}{Δ t_{S}} S_{φ} . & (1) \end{matrix}$

[0089] In order to determine the subband transposition factor Q which is applied within the subband processing unit 102 in order to achieve a physical transposition Q.sub.φ, it is observed that an input sinusoid to the analysis filterbank 101 of physical frequency Ω will result in a complex analysis subband signal with discrete time frequency ω=Ω.Math.Δt.sub.A and the main contribution occurs within the analysis subband with index n≈Ω/Δf.sub.A. An output sinusoid at the output of the synthesis filterbank 103 of the desired transposed physical frequency Q.sub.φ.Math.Ω will result from feeding the synthesis subband with index m≈Q.sub.φ.Math.Ω/Δf.sub.S with a complex subband signal of discrete frequency Q.sub.φ.Math.Ω.Math.Δt.sub.S. In this context, care should be taken in order to avoid the synthesis of aliased output frequencies different from Q.sub.φ.Math.Ω. Typically this can be avoided by making appropriate second order choices as discussed, e.g. by selecting appropriate analysis/synthesis filterbanks. The discrete frequency Q.sub.φ.Math.Ω.Math.Δt.sub.S at the output of the subband processing unit 102 should correspond to the discrete time frequency ω=Ω.Math.Δt.sub.A at the input of the subband processing unit 102 multiplied by the subband transposition factor Q. I.e. by setting equal QΩΔt.sub.A and Q.sub.φ.Math.Ω.Math.Δt.sub.s, the following relation between the physical transposition factor Q.sub.φ and the subband transposition factor Q may be determined:

[00010] $\begin{matrix} Q = \frac{Δ t_{s}}{Δ t_{A}} Q_{φ}, & (2) \end{matrix}$

[0090] Likewise, the appropriate source or analysis subband index n of the subband processing unit 102 for a given target or synthesis subband index m should obey

[00011] $\begin{matrix} n \approx \frac{Δ f_{S}}{Δ f_{A}} .Math. \frac{1}{Q_{φ}} m . & (3) \end{matrix}$

[0091] In an embodiment, it holds that Δf.sub.S/Δf.sub.A=Q.sub.φ, i.e. the frequency spacing of the synthesis filterbank 103 corresponds to the frequency spacing of the analysis filterbank 101 multiplied by the physical transposition factor, and the one-to-one mapping of analysis to synthesis subband index n=m can be applied. In other embodiments, the subband index mapping may depend on the details of the filterbank parameters. In particular, if the fraction of the frequency spacing of the synthesis filterbank 103 and the analysis filterbank 101 is different from the physical transposition factor Q.sub.φ, one or two source subbands may be assigned to a given target subband. In the case of two source subbands, it may be preferable to use two adjacent source subbands with index n, n+1, respectively. That is, the first and second source subbands are given by either (n(m),n(m)+1) or (n(m)+1,n(m)).

[0092] The subband processing of FIG. 2 with a single source subband will now be described as a function of the subband processing parameters S and Q. Let x(k) be the input signal to the block extractor 201, and let p be the input block stride. I.e. x(k) is a complex valued analysis subband signal of an analysis subband with index n. The block extracted by the block extractor 201 can without loss of generality be considered to be defined by the L=2R+1 samples

x.sub.l(k)=x(Qk+pl),|k|≤R, (4)

wherein the integer l is a block counting index, L is the block length and R is an integer with R≥0. Note that for Q=1, the block is extracted from consecutive samples but for Q>1 a downsampling is performed in such a manner that the input addresses are stretched out by the factor Q. If Q is an integer this operation is typically straightforward to perform, whereas an interpolation method may be required for non-integer values of Q. This statement is relevant also for non-integer values of the increment p, i.e. of the input block stride. In an embodiment, short interpolation filters, e.g. filters having two filter taps, can be applied to the complex valued subband signal. For instance, if a sample at the fractional time index k+0.5 is required, a two tap interpolation of the form x(k+0.5)≈ax(k)+bx(k+1) may lead to a sufficient quality.

[0093] An interesting special case of formula (4) is R=0, where the extracted block consists of a single sample, i.e. the block length is L=1.

[0094] With the polar representation of a complex number z=|z|exp(i∠z), wherein |z| is the magnitude of the complex number and ∠z is the phase of the complex number, the nonlinear processing unit 202 producing the output frame y.sub.l from the input frame x.sub.l is advantageously defined by the phase modification factor T=SQ through

[00012] $\begin{matrix} {\begin{matrix} ∠ y_{l} (k) = (T - 1) ∠ x_{l} (0) + ∠ x_{l} (k) + θ \\ .Math. y_{l} (k) .Math. = {.Math. x_{l} (0) .Math.}^{ρ} {.Math. x_{l} (k) .Math.}^{1 - ρ} \end{matrix}}, .Math. k .Math. \leq R & (5) \end{matrix}$

where ρ∈[0,1] is a geometrical magnitude weighting parameter. The case ρ=0 corresponds to a pure phase modification of the extracted block. The phase correction parameter θ depends on the filterbank details and the source and target subband indices. In an embodiment, the phase correction parameter θ may be determined experimentally by sweeping a set of input sinusoids. Furthermore, the phase correction parameter θ may be derived by studying the phase difference of adjacent target subband complex sinusoids or by optimizing the performance for a Dirac pulse type of input signal. The phase modification factor T should be an integer such that the coefficients T−1 and 1 are integers in the linear combination of phases in the first line of formula (5). With this assumption, i.e. with the assumption that the phase modification factor T is an integer, the result of the nonlinear modification is well defined even though phases are ambiguous by addition of arbitrary integer multiples of 2π.

[0095] In words, formula (5) specifies that the phase of an output frame sample is determined by offsetting the phase of a corresponding input frame sample by a constant offset value. This constant offset value may depend on the modification factor T, which itself depends on the subband stretch factor and/or the subband transposition factor. Furthermore, the constant offset value may depend on the phase of a particular input frame sample from the input frame. This particular input frame sample is kept fixed for the determination of the phase of all the output frame samples of a given block. In the case of formula (5), the phase of the center sample of the input frame is used as the phase of the particular input frame sample. In addition, the constant offset value may depend on a phase correction parameter θ which may e.g. be determined experimentally.

[0096] The second line of formula (5) specifies that the magnitude of a sample of the output frame may depend on the magnitude of the corresponding sample of the input frame. Furthermore, the magnitude of a sample of the output frame may depend on the magnitude of a particular input frame sample. This particular input frame sample may be used for the determination of the magnitude of all the output frame samples. In the case of formula (5), the center sample of the input frame is used as the particular input frame sample. In an embodiment, the magnitude of a sample of the output frame may correspond to the geometrical mean of the magnitude of the corresponding sample of the input frame and the particular input frame sample.

[0097] In the windowing unit 203, a window w of length L is applied on the output frame, resulting in the windowed output frame

custom-character (k)=w(k)y.sub.l(k),|k|≤R. (6)

[0098] Finally, it is assumed that all frames are extended by zeros, and the overlap and add operation 204 is defined by

[00013] $\begin{matrix} z (k) = \underset{l}{.Math.} z_{l} (k - S p l), & (7) \end{matrix}$

wherein it should be noted that the overlap and add unit 204 applies a block stride of Sp, i.e. a time stride which is S times higher than the input block stride p. Due to this difference in time strides of formula (4) and (7) the duration of the output signal custom-character (k) is S times the duration of the input signal x(k), i.e. the synthesis subband signal has been stretched by the subband stretch factor S compared to the analysis subband signal. It should be noted that this observation typically applies if the length L of the window is negligible in comparison to the signal duration.

[0099] For the case where a complex sinusoid is used as input to the subband processing 102, i.e. an analysis subband signal corresponding to a complex sinusoid

x(k)=C exp(iωk), (8)

it may be determined by applying the formulas (4)-(7) that the output of the subband processing 102, i.e. the corresponding synthesis subband signal, is given by

[00014] $\begin{matrix} z (k) = .Math. C .Math. \exp [i (T ∠ C + θ + Q ω k)] \underset{l}{.Math.} w (k - Spl) . & (9) \end{matrix}$

[0100] Hence a complex sinusoid of discrete time frequency ω will be transformed into a complex sinusoid with discrete time frequency Qω provided the window shifts with a stride of S p sum up to the same constant value K for all k,

[00015] $\begin{matrix} \underset{l}{.Math.} w (k - S p l) = K . & (10) \end{matrix}$

[0101] It is illustrative to consider the special case of pure transposition where S=1 and T=Q. If the input block stride is p=1 and R=0, all the above, i.e. notably formula (5), reduces to the point-wise or sample based phase modification rule

[00016] $\begin{matrix} {\begin{matrix} ∠ z (k) = T ∠ x (k) + θ \\ .Math. z (k) .Math. = .Math. x (k) .Math. \end{matrix}} . & (11) \end{matrix}$

[0102] The advantage of using a block size R>0 becomes apparent when a sum of sinusoids is considered within an analysis subband signal x(k). The problem with the point-wise rule (11) for a sum of sinusoids with frequencies ω.sub.1,ω.sub.2, . . . ,ω.sub.N is that not only the desired frequencies Qω.sub.1,Qω.sub.2, . . . ,Qω.sub.N will be present in the output of the subband processing 102, i.e. within the synthesis subband signal z(k), but also intermodulation product frequencies of the form

[00017] $\underset{n}{.Math.} a_{n} ω_{n} .$

Using a block R>0 and a window satisfying formula (10) typically leads to a suppression of these intermodulation products. On the other hand, a long block will lead to a larger degree of undesired time smearing for transient signals. Furthermore, for pulse train like signals, e.g. a human voice in case of vowels or a single pitched instrument, with sufficiently low pitch, the intermodulation products could be desirable as described in WO 2002/052545. This document is incorporated by reference.

[0103] In order to address the issue of relatively poor performance of the block based subband processing 102 for transient signals, it is suggested to use a nonzero value of the geometrical magnitude weighting parameter ρ>0 in formula (5). It has been observed (see e.g. FIG. 7) that the selection of a geometrical magnitude weighting parameter ρ>0 improves the transient response of the block based subband processing 102 compared to the use of pure phase modification with ρ=0, while at the same time maintaining a sufficient power of intermodulation distortion suppression for stationary signals. A particularly attractive value of the magnitude weighting is ρ=1−1/T, for which the nonlinear processing formula (5) reduces to the calculation steps

[00018] $\begin{matrix} {\begin{matrix} g_{l} (k) = \frac{x_{l} (k)}{{.Math. x_{l} (k) .Math.}^{1 - 1 / T}} \\ y_{l} (k) = {g_{l} (0)}^{T - 1} g_{l} (k) e^{i θ} \end{matrix}} . & (12) \end{matrix}$

[0104] These calculation steps represent an equivalent amount of computational complexity compared to the operation of a pure phase modulation resulting from the case of ρ=0 in formula (5). In other words, the determination of the magnitude of the output frame samples based on the geometrical means formula (5) using the magnitude weighting ρ=1−1/T can be implemented without any additional cost in computational complexity. At the same time, the performance of the harmonic transposer for transient signals improves, while maintaining the performance for stationary signals.

[0105] As has been outlined in the context of FIGS. 1, 2 and 3, the subband processing 102 may be further enhanced by applying control data 104. In an embodiment, two configurations of the subband processing 102 sharing the same value of K in formula (11) and employing different block lengths may be used to implement a signal adaptive subband processing. The conceptual starting point in designing a signal adaptive configuration switching subband processing unit may be to imagine the two configurations running in parallel with a selector switch at their outputs, wherein the position of the selector switch depends on the control data 104. The sharing of K-value ensures that the switch is seamless in the case of a single complex sinusoid input. For general signals the hard switch on a subband signal level is automatically windowed by the surrounding filterbank framework 101, 103 so as to not introduce any switching artifacts on the final output signals. It can be shown that as a result of the overlap and add process in formula (7) an output identical to that of the conceptual switched system described above can be reproduced at the computational cost of the system of the configuration with the longest block, when the block sizes are sufficiently different, and the update rate of the control data is not too fast. Hence there is no penalty in computational complexity associated with a signal adaptive operation. According to the discussion above, the configuration with the shorter block length is more suitable for transient and low pitched periodical signals, whereas the configuration with longer block length is more suitable for stationary signals. As such, a signal classifier may be used to classify excerpts of an audio signal into a transient class and a non-transient class, and to pass this classification information as control data 104 to the signal adaptive configuration switching subband processing unit 102. The subband processing unit 102 may use the control data 104 to set certain processing parameters, e.g. the block length of the block extractors.

[0106] In the following, the description of the subband processing will be extended to cover the case of FIG. 3 with two subband inputs. Only the modifications which are made to the single input case will be described. Otherwise, reference is made to the information provided above. Let x(k) be the input subband signal to the first block extractor 301-1 and let {tilde over (x)}(k) be the input subband signal to the second block extractor 301-2. The block extracted by block extractor 301-1 is defined by formula (4) and the block extracted by block extractor 301-2 consist of the single subband sample

{tilde over (x)}.sub.l(0)={tilde over (x)}(pl). (13)

I.e. in the outlined embodiment, the first block extractor 301-1 uses a block length of L, whereas the second block extractor 301-2 uses a block length of 1. In such a case, the nonlinear processing 302 produces the output frame y.sub.l may be defined by

[00019] $\begin{matrix} {\begin{matrix} ∠ y_{l} (k) = (T - 1) ∠ {\tilde{x}}_{l} (0) + ∠ x_{l} (k) + θ \\ .Math. y_{l} (k) .Math. = {.Math. {\tilde{x}}_{l} (0) .Math.}^{ρ} {.Math. x_{l} (k) .Math.}^{1 - ρ} \end{matrix}}, & (14) \end{matrix}$

and the rest of the processing in 203 and 204 is identical to the processing described in the context of the single input case. In other words, it is suggested to replace the particular frame sample of formula (5) by the single subband sample extracted from the respective other analysis subband signal.

[0107] In an embodiment, wherein the ratio of the frequency spacing Δf.sub.S of the synthesis filterbank 103 and the frequency spacing Δf.sub.A of the analysis filterbank 101 is different from the desired physical transposition factor Q.sub.φ, it may be beneficial to determine the samples of a synthesis subband with index m from two analysis subbands with index n, n+1, respectively. For a given index m, the corresponding index n may be given by the integer value obtained by truncating the analysis index value n given by formula (3). One of the analysis subband signals, e.g. the analysis subband signal corresponding to index n, is fed into the first block extractor 301-1 and the other analysis subband signal, e.g. the one corresponding to index n+1, is fed into the second block extractor 301-2. Based on these two analysis subband signals a synthesis subband signal corresponding to index m is determined in accordance to the processing outlined above. The assignment of the adjacent analysis subband signals to the two block extractors 301-1 and 302-1 may by based on the remainder that is obtained when truncating the index value of formula (3), i.e. the difference of the exact index value given by formula (3) and the truncated integer value n obtained from formula (3). If the remainder is greater than 0.5, then the analysis subband signal corresponding to index n may be assigned to the second block extractor 301-2, otherwise this analysis subband signal may be assigned to the first block extractor 301-1.

[0108] FIG. 4 illustrates an example scenario for the application of subband block based transposition using several orders of transposition in a HFR enhanced audio codec. A transmitted bit-stream is received at the core decoder 401, which provides a low bandwidth decoded core signal at a sampling frequency fs. This low bandwidth decoded core signal may also be referred to as the low frequency component of the audio signal. The signal at low sampling frequency fs may be re-sampled to the output sampling frequency 2fs by means of a complex modulated 32 band QMF analysis bank 402 followed by a 64 band QMF synthesis bank (Inverse QMF) 405. The two filterbanks 402 and 405 have the same physical parameters Δt.sub.S=Δt.sub.A and Δf.sub.S=Δf.sub.A and the HFR processing unit 404 typically lets through the unmodified lower subbands corresponding to the low bandwidth core signal. The high frequency content of the output signal is obtained by feeding the higher subbands of the 64 band QMF synthesis bank 405 with the output bands from the multiple transposer unit 403, subject to spectral shaping and modification performed by the HFR processing unit 404. The multiple transposer 403 takes as input the decoded core signal and outputs a multitude of subband signals which represent the 64 QMF band analysis of a superposition or combination of several transposed signal components. In other words, the signal at the output of the multiple transposer 403 should correspond to the transposed synthesis subband signals which may be fed into a synthesis filterbank 103, which in the case of FIG. 4 is represented by the inverse QMF filterbank 405.

[0109] Possible implementations of a multiple transposer 403 are outlined in the context of FIGS. 5 and 6. The objective of the multiple transposer 403 is that if the HFR processing 404 is bypassed, each component corresponds to an integer physical transposition without time stretch of the core signal, (Q.sub.φ=2, 3, . . . , and S.sub.φ=1). For transient components of the core signal, the HFR processing can sometimes compensate for poor transient response of the multiple transposer 403 but a consistently high quality can typically only be reached if the transient response of the multiple transposer itself is satisfactory. As outlined in the present document, a transposer control signal 104 can affect the operation of the multiple transposer 403, and thereby ensure a satisfactory transient response of the multiple transposer 403. Alternatively or in addition, the above geometric weighting scheme (see e.g. formula (5) and/or formula (14) may contribute to improving the transient response of the harmonic transposer 403.

[0110] FIG. 5 illustrates an example scenario for the operation of a multiple order subband block based transposition unit 403 applying a separate analysis filter bank 502-2, 502-3, 502-4 per transposition order. In the illustrated example, three transposition orders Q.sub.φ=2,3,4 are to be produced and delivered in the domain of a 64 band QMF bank operating at output sampling rate 2fs. The merging unit 504 selects and combines the relevant subbands from each transposition factor branch into a single multitude of QMF subbands to be fed into the HFR processing unit.

[0111] Consider first the case Q.sub.φ=2. The objective is specifically that the processing chain of a 64 band QMF analysis 502-2, a subband processing unit 503-2, and a 64 band QMF synthesis 405 results in a physical transposition of Q.sub.φ=2 with S.sub.φ=1 (i.e. no stretch). Identifying these three blocks with the units 101, 102 and 103 of FIG. 1, respectively, one finds that Δt.sub.S/Δt.sub.A=½ and Δf.sub.S/Δf.sub.A=2 such that formulas (1)-(3) result in the following specifications for the subband processing unit 503-2. The subband processing unit 503-2 has to perform a subband stretch of S=2, a subband transposition of Q=1 (i.e. none) and a correspondence between source subbands with index n and target subbands with index m given by n=m (see formula (3)).

[0112] For the case Q.sub.φ=3, the exemplary system includes a sampling rate converter 501-3 which converts the input sampling rate down by a factor 3/2 from fs to 2fs/3. The objective is specifically that the processing chain of the 64 band QMF analysis 502-3, the subband processing unit 503-3, and a 64 band QMF synthesis 405 results in a physical transposition of Q.sub.φ=3 with S.sub.φ=1 (i.e. no stretch). Identifying the above three blocks with units 101, 102 and 103 of FIG. 1, respectively, one finds due to the resampling that Δt.sub.S/Δt.sub.A=⅓ and Δf.sub.S/Δf.sub.A=3 such that formulas (1)-(3) provide the following specifications for the subband processing unit 503-3. The subband processing unit 503-3 has to perform a subband stretch of S=3, a subband transposition of Q=1 (i.e. none) and a correspondence between source subbands with index n and target subbands with index m given by n=m (see formula (3)).

[0113] For the case Q.sub.φ=4, the exemplary system includes a sampling rate converter 501-4 which converts the input sampling rate down by a factor two from fs to fs/2. The objective is specifically that the processing chain of the 64 band QMF analysis 502-4, the subband processing unit 503-4, and a 64 band QMF synthesis 405 results in a physical transposition of Q.sub.φ=4 with S.sub.φ=1 (i.e. no stretch). Identifying these three blocks of the processing chain with units 101, 102 and 103 of FIG. 1, respectively, one finds due to the resampling that Δt.sub.S/Δt.sub.A=¼ and Δf.sub.S/Δf.sub.A=4 such that formulas (1)-(3) provide the following specifications for subband processing unit 503-4. The subband processing unit 503-4 has to perform a subband stretch of S=4, a subband transposition of Q=1 (i.e. none) and a correspondence between source subbands with n and target subbands with index m given by n=m.

[0114] As a conclusion for the exemplary scenario of FIG. 5, the subband processing units 504-2 to 503-4 all perform pure subband signal stretches and employ the single input nonlinear subband block processing described in the context of FIG. 2. When present, the control signal 104 may simultaneously affect the operation of all three subband processing units. In particular, the control signal 104 may be used to simultaneously switch between long block length processing and short block length processing depending on the type (transient or non-transient) of the excerpt of the input signal. Alternatively or in addition, when the three subband processing units 504-2 to 504-4 make use of a nonzero geometrical magnitude weighting parameter ρ>0, the transient response of the multiple transposer will be improved compared to the case where ρ=0.

[0115] FIG. 6 illustrates an example scenario for the efficient operation of a multiple order subband block based transposition applying a single 64 band QMF analysis filter bank. Indeed, the use of three separate QMF analysis banks and two sampling rate converters in FIG. 5 results in a rather high computational complexity, as well as some implementation disadvantages for frame based processing due to the sampling rate conversion 501-3, i.e. a fractional sampling rate conversion. It is therefore suggested to replace the two transposition branches comprising units 501-3.fwdarw.502-3.fwdarw.503-3 and 501-4.fwdarw.502-4.fwdarw.503-4 by the subband processing units 603-3 and 603-4, respectively, whereas the branch 502-2.fwdarw.503-2 is kept unchanged compared to FIG. 5. All three orders of transposition are performed in a filterbank domain with reference to FIG. 1, where Δt.sub.S/Δt.sub.A=½ and Δf.sub.S/Δf.sub.A=2. In other words, only a single analysis filterbank 502-2 and a single synthesis filterbank 405 is used, thereby reducing the overall computational complexity of the multiple transposer.

[0116] For the case Q.sub.φ=3, S.sub.φ=1, the specifications for subband processing unit 603-3 given by formulas (1)-(3) are that the subband processing unit 603-3 has to perform a subband stretch of S=2 and a subband transposition of Q=3/2, and that the correspondence between source subbands with index n and target subbands with index m is given by n≈2 m/3. For the case Q.sub.φ=4, S.sub.φ=1, the specifications for subband processing unit 603-4 given by formulas (1)-(3) are that the subband processing unit 603-4 has to perform a subband stretch of S=2 and a subband transposition of Q=2, and that the correspondence between source subbands with index n and target subbands with index m is given by n≈2 m.

[0117] It can be seen that formula (3) does not necessarily provide an integer valued index n for a target subband with index m. As such, it may be beneficial to consider two adjacent source subbands for the determination of a target subband as outlined above (using formula (14)). In particular, this may be beneficial for target subbands with index m, for which formula (3) provides a non-integer value for index n. On the other hand, target subbands with index m, for which formula (3) provides an integer value for index n, may be determined from the single source subband with index n (using formula (5)). In other words, it is suggested that a sufficiently high quality of harmonic transposition may be achieved by using subband processing units 603-3 and 603-4 which both make use of nonlinear subband block processing with two subband inputs as outlined in the context of FIG. 3. Moreover, when present, the control signal 104 may simultaneously affect the operation of all three subband processing units. Alternatively or in addition, when the three units 503-2, 603-3, 603-4 make use of a nonzero geometrical magnitude weighting parameter ρ>0, the transient response of the multiple transposer may be improved compared to the case where ρ=0.

[0118] FIG. 7 illustrates an example transient response for a subband block based time stretch of a factor two. The top panel depicts the input signal, which is a castanet attack sampled at 16 kHz. A system based on the structure of FIG. 1 is designed with a 64 band QMF analysis filterbank 101 and a 64 band QMF synthesis filterbank 103. The subband processing unit 102 is configured to implement a subband stretch of a factor S=2, no subband transposition (Q=1) and a direct one-to-one mapping of source to target subbands. The analysis block stride is p=1 and the block size radius is R=7 so the block length is L=15 subband samples which corresponds to 15.Math.64=960 signal domain (time domain) samples. The window w is a raised cosine, e.g. a cosine raised to the power of 2. The middle panel of FIG. 7 depicts the output signal of the time stretching when a pure phase modification is applied by the subband processing unit 102, i.e. the weighting parameter ρ=0 is used for the nonlinear block processing according to formula (5). The bottom panel depicts the output signal of the time stretching when the geometrical magnitude weighting parameter ρ=½ is used for the nonlinear block processing according to formula (5). As can be seen, the transient response is significantly better in the latter case. In particular, it can be seen that the subband processing using the weighting parameter ρ=0 results in artifacts 701 which are significantly reduced (see reference numeral 702) with the subband processing using the weighting parameter ρ=½.

[0119] In the present document, a method and system for harmonic transposition based HFR and/or for time stretching has been described. The method and system may be implemented at significantly reduced computational complexity compared to conventional harmonic transposition based HFR, while providing a high quality harmonic transposition for stationary as well as for transient signals. The described harmonic transposition based HFR makes use of block based nonlinear subband processing. The use of signal dependent control data is proposed to adapt the nonlinear subband processing to the type, e.g. transient or non-transient, of the signal. Furthermore, the use of a geometrical weighting parameter is suggested in order to improve the transient response of harmonic transposition using block based nonlinear subband processing. Finally, a low complexity method and system for harmonic transposition based HFR is described which makes use of a single analysis/synthesis filterbank pair for harmonic transposition and HFR processing. The outlined methods and systems may be employed in various decoding devices, e.g. in multimedia receivers, video/audio settop boxes, mobile devices, audio players, video players, etc.

[0120] The methods and systems for transposition and/or high frequency reconstruction and/or time stretching described in the present document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other components may e.g. be implemented as hardware and or as application specific integrated circuits. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks or wireline networks, e.g. the internet. Typical devices making use of the methods and systems described in the present document are portable electronic devices or other consumer equipment which are used to store and/or render audio signals. The methods and system may also be used on computer systems, e.g. internet web servers, which store and provide audio signals, e.g. music signals, for download.

SUBBAND BLOCK BASED HARMONIC TRANSPOSITION

Assignee

Inventors

Cpc classification

Classification Explorer

G10L21/04

PHYSICS

Classification Explorer

G10L19/0204

PHYSICS

Classification Explorer

G10L19/032

PHYSICS

Classification Explorer

G10L25/18

PHYSICS

Classification Explorer

G10L21/038

PHYSICS

Classification Explorer

G10L19/022

PHYSICS

International classification

Classification Explorer

G10L21/038

PHYSICS

Classification Explorer

G10L19/02

PHYSICS

Classification Explorer

G10L19/022

PHYSICS

Classification Explorer

G10L21/04

PHYSICS

Classification Explorer

G10L25/18

PHYSICS

Classification Explorer

G10L19/032

PHYSICS

Abstract

Claims

Description