Resampling of an audio signal by interpolation for low-delay encoding/decoding
10510357 ยท 2019-12-17
Assignee
Inventors
Cpc classification
G10L21/00
PHYSICS
G10L19/24
PHYSICS
G06F17/17
PHYSICS
International classification
G10L19/24
PHYSICS
G10L19/02
PHYSICS
G06F17/11
PHYSICS
G10L21/00
PHYSICS
Abstract
A method is provided for resampling an audio-frequency signal in an audio-frequency signal encoding or decoding operation. The resampling is carried out by a method of interpolation of an order greater than one. The method is such that the interpolated samples are obtained by calculating a weighted average of possible interpolation values calculated over a plurality of intervals covering the time location of the sample to be interpolated. A resampling device is provided, which implements the method, and also an encoder and decoder including at least one resampling device.
Claims
1. A method comprising: receiving an audio frequency signal for resampling from a non-transitory computer-readable memory of a FIR type resampling filter; and resampling the audio frequency signal by an audio frequency signal coder or decoder device, the resampling comprising performing an interpolation method of order higher than one, and obtaining interpolated samples by a computation of a weighted average of possible interpolation values computed over a plurality of intervals covering a temporal location of the sample to be interpolated, the interpolated samples complementing a signal decoded according to a restricted predictive decoding mode in a transition frame between a predictive decoding and a transform decoding prior to an act of combination between samples decoded according to the restricted predictive decoding and the samples decoded according to a transform decoding in the transition frame.
2. The method as claimed in claim 1, wherein the interpolation is of 2.sup.nd order parabolic type.
3. The method as claimed in claim 1, wherein the interpolation is of 3.sup.rd order cubic type and the number of intervals covering the temporal location of the sample to be interpolated is 3.
4. The method as claimed in claim 1, wherein the weighted average is obtained with one and the same weighting value for each of the possible interpolation values.
5. The method as claimed in claim 3, wherein a different weighting value is applied for the interpolation value computed for the central interval of the three intervals and for the computation of the weighted average.
6. The method as claimed in claim 1, wherein the weighting values applied to the possible interpolation values are determined as a function a frequency criterion of the sample to be interpolated.
7. A device for resampling an audio frequency signal in an audio frequency signal coder or decoder, wherein the device comprises: a non-transitory computer-readable medium comprising instructions stored thereon; a processor configured by the instructions to perform acts comprising: receiving the audio frequency signal for resampling from a non-transitory computer-readable memory of a FIR type resampling filter; and resampling the audio frequency signal by an interpolation method of order higher than one, comprising: computing possible interpolation values for a plurality of intervals covering a temporal location of a sample to be interpolated; and obtaining the sample to be interpolated by computing a weighted average of the possible interpolation values, the interpolated samples complementing a signal decoded according to a restricted predictive decoding mode in a transition frame between a predictive decoding and a transform decoding prior to an act of combination between samples decoded according to the restricted predictive decoding and the samples decoded according to a transform decoding in the transition frame.
8. The device of claim 7, wherein the device is implemented in an audio frequency signal coder.
9. The device of claim 7, wherein the device is implemented in an audio frequency signal decoder.
10. A non-transitory processor-readable storage medium, on which is stored a computer program comprising code instructions for executing a method when the instructions are executed by a processor of an audio frequency signal coder or decoder device, wherein the instructions configure the processor to perform acts comprising: receiving an audio frequency signal for resampling from a non-transitory computer-readable memory of a FIR type resampling filter; and resampling the audio frequency signal by an audio frequency signal coder or decoder device, the resampling comprising performing an interpolation method of order higher than one, and obtaining interpolated samples by a computation of a weighted average of possible interpolation values computed over a plurality of intervals covering a temporal location of the sample to be interpolated, the interpolated samples complementing a signal decoded according to a restricted predictive decoding mode in a transition frame between a predictive decoding and a transform decoding prior to an act of combination between samples decoded according to the restricted predictive decoding and the samples decoded according to a transform decoding in the transition frame.
11. The method as claimed in claim 1, further comprising: transmitting a resampled audio frequency signal, comprising the interpolated samples, by the coder or decoder device at a sampling frequency.
12. The device as claimed in claim 7, further comprising: an output module, which transmits a resampled audio frequency signal comprising the interpolated samples, at a sampling frequency.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
(17)
(18) The steps of this method are implemented with, as input (x.sub.In), an audio frequency signal at the input sampling frequency f.sub.In. This input signal can for example be the signal vectors of short length contained in a resampling filter memory as described later with reference to
(19) In the embodiment described here, an interpolation method of 3.sup.rd order cubic type is used. A different order of interpolation can of course be used, the order however being greater than one.
(20) In the step E701, a cubic interpolation is used not only on the central interval but over the 3 intervals: the interval on the right (interval [1, 2]) of the preceding cubic, the central interval of the central cubic (interval [0, 1]) and the interval on the left (interval [1, 0]) of the next cubic to interpolate a value at a temporal instant x in [0, 1].
(21) The three possible interpolation values are obtained. This increases the computation complexity in a limited way because the coefficients of a cubic are in any case computed per interval. If use is made of the simplified notation (without mentioning the 3.sup.rd order) a.sub.n, b.sub.n, c.sub.n, d.sub.n for the coefficients of the cubic of which the central interval is used, a.sub.n1, b.sub.n1, c.sub.n1, d.sub.n1 for the coefficients of the cubic in the preceding interval and a.sub.n+1, b.sub.n+1, c.sub.n+1, d.sub.n+1 for the coefficients of the cubic in the next interval, the three possible interpolation values are obtained by:
vcp(x)=a.sub.n1*(x+1).sup.3+b.sub.n1*(x+1).sup.2+c.sub.n1(x+1)+d.sub.n1,
vcc(x)=a.sub.n*x.sup.3+b.sub.n*x.sup.2+c.sub.nx+d.sub.n, and
vcs(x)=a.sub.n+1*(x1).sup.3+b.sub.n1*(x1).sup.2+c.sub.n+1(x1)+d.sub.n+1.
(22) Once again, the values (x+1).sup.3, (x+1).sup.2, x.sup.3, x.sup.2, (x1).sup.3 and (x1).sup.2 can be tabulated to reduce the complexity.
(23) Thus, the step E701 computes possible interpolation values over a plurality of intervals covering the temporal location of the sample to be interpolated (in the example given here, the interpolation order is 3).
(24) In the step E702, a weighted average of the three possible interpolated values is computed to obtain the sample to be interpolated. The output signal resampled at the output frequency f.sub.Out, by the interpolation as described here, is then obtained (x.sub.Out).
(25) Thus, the value of the sample interpolated at the instant x (relative to the central cubic therefore x in [0, 1]) is obtained by the weighted sum of these 3 values:
(26) Vc3=pp*vcp(x)+pc*vcc(x)+ps*vcs(x) where, in an exemplary embodiment, the weighting coefficient pp, pc and ps are in the interval [0, 1], with pp+pc+ps=1 and, generally, pp=ps=(1pc)/2.
(27) For example, pp=pc=ps= can be chosen. In this case, the division by 3 can be integrated in the coefficients of the cubics.
(28) It will be noted that the invention illustrated in
(29) It is assumed that the samples at the start of the output buffer (between the two first samples x.sub.In(n), n=0,1) can be interpolated by knowing the values of the past signal at the preceding instants n=1,2 which are necessary to determine the first coefficients a.sub.1, b.sub.1, c.sub.1, d.sub.1, a.sub.0, b.sub.0, c.sub.0 and d.sub.0; these past samples can be incorporated in the input buffer or used separately in the implementation of the block E701.
(30) The samples at the end of the output buffer (between and after the two last samples, x.sub.In(n), n=L2,L1) cannot be directly interpolated according to the blocks E701 and E702 because there is in general no future signal available, corresponding to the instants n=L, L+1, which are necessary to determine the last coefficients a.sub.L11, b.sub.L1, c.sub.L1, d.sub.L1, a.sub.L, b.sub.L, c.sub.L and d.sub.L. Different variants for processing the samples at the edges are described later.
(31) The samples thus interpolated with pp=pc=ps= are illustrated in
(32)
(33) With the solution according to the invention, the SNR for the speech signal is 40 dB. To recap, the SNRs obtained were 38.2 dB with the cubic interpolation known from the prior art and 41.4 dB with the interpolation by cubic spline. It can be seen that the proposed interpolation gives a better SNR compared to the Lagrange polynomial interpolations.
(34) In a variant of the invention, the weights (pp, pc, ps) are set at other predetermined values. In another exemplary embodiment, pp=ps=0.5 and pc=0 are chosen, which amounts to using the average of the interpolated values from the 2 extreme intervals. This reduces the number of operations to 47 (i.e. 300 800 operations per second) while having a significantly higher performance level than the simple cubic (Lagrange) interpolation. The SNR obtained for the real test signal is 40.4 dB. This solution has performance levels which are less good for the low frequencies but better for the high frequencies than the solution with three identical weights, as
(35) In another variant of the invention, it will also be possible to use weights (pp, pc, ps) that are variable according to a criterion. For example, if the signal to be interpolated contains mostly low frequencies, the first solution proposed (pp=pc=ps=) will be used, otherwise the second (pp=ps=0.5 and pc=0) will be used.
(36) The principle of the invention can be generalized for the interpolations of order other than 3. For example, in the case of a parabolic interpolation, it is possible to take the average of the 2 values given by the 2 possible parabolas.
(37) In this case, the interpolated samples are obtained by a computation of a weighted average of possible interpolation values computed over two intervals of values covering the temporal location of the sample to be interpolated.
(38) This solution gives a result that is virtually equivalent to the simple cubic interpolation where only the central interval is used.
(39)
(40) In this embodiment, interest is focused on the unified coding of the speech, music and mixed content signals, through multi-mode techniques alternating at least two coding modes, and of which the algorithmic delay is suited to the conversational applications (typically 32 ms).
(41) Among these unified coding techniques, it is possible to cite prior art coders/decoders (codecs), like the AMR-WB+ codec or, more recently, the MPEG USAC (Unified Speech Audio Coding) codec. The applications targeted by these codecs are not conversational, but correspond to broadcast and storage services, with no strong constraints on the algorithmic delay. The principle of the unified coding is to alternate between at least two coding modes: for the signals of speech type: temporal mode, here denoted LPD (for linear predictive domain) generally of CELP (coded excitation linear prediction) type for the signals of music type: frequency mode, here denoted FD (for frequency domain) with a transform generally of MDCT (modified discrete cosine transform) type.
The principles of the CELP and MDCT codings are summarized below.
(42) Firstly, the CELP codingincluding its ACELP variantis a predictive coding based on the source-filtered model. The filter corresponds in general to an all-pole filter of transfer function 1/A(z) obtained by linear prediction (LPC, linear predictive coding). In practice, the synthesis uses the quantized version, 1/(z), of the filter 1/A(z). The sourcethat is to say the excitation of the linear predictive filter 1/(z)is, in general, the combination of an excitation obtained by long-term prediction modeling the vibration of the vocal cords, and of a stochastic (or innovation) excitation described in the form of algebraic codes (ACELP), of noise dictionaries, etc. The search for optimal excitation is performed by the minimizing of a square error criterion in the weighted signal domain by a filter of transfer function W(z), generally derived from the predictive linear filter A(z), of the form W(z)=A(z/2)/A(z/2) or A(z/1)/(1z.sup.1). Secondly, the coding by MDCT transform analyzes the input signal with a time/frequency transformation generally comprising different steps: 1. weighting of the signal by a window here called MDCT window 2. temporal aliasing (or time-domain aliasing) to form a reduced block (in its conventional formula of length divided by 2) 3. DCT (discrete cosine transform) transformation of the reduced block. The MDCT windowing can be adapted and the MDCT coefficients can be quantized by various methods according to an allocation of the bits (for example by frequency subbands).
In the codecs using at least two coding modes, the transitions between LDP and FD modes are crucial to ensure a sufficient quality without switching defect, knowing that the FD and LPD modes are of different naturesone relying on a coding by transform with overlap, whereas the other uses a linear predictive coding with rectangular blocks and filter memories updated on each frame.
(43) For the coder illustrated in
(44) In this embodiment illustrated in
The bit stream for each 20 ms input frame is multiplexed by the multiplexing module 814.
(45) The case of a transition from an LPD coding to FD coding is described for example in the published European patent application EP 2656343 incorporated here for reference. In this case, as illustrated in
(46) Here, the same principle is again applied of propagation of the signal by performing a simplified restricted LPD coding as described in the application EP 2656343 to fill this missing signal (zone denoted TR) in the transition frame of FD type which follows an LPD frame; it will be noted that the MDCT window illustrated here will be able to be modified in variants of the invention without changing the principle of the invention; in particular, the MDCT window in the transition frame will be able to be different from the MDCT window(s) used normally in the FD coding mode when the current frame is not an LDP to FD transition frame.
(47) However, in the coder illustrated in
(48) It is assumed here that the resampling from 12.8 or 16 kHz at fs of the resampling block 830 is performed by polyphase FIR filtering with a filter memory (called mem). This memory stores the last samples of the preceding frame of the signal decoded by LPD or TR mode at the frequency 12.8 or 16 kHz. The length of this memory corresponds to the FIR filtering delay. Because of this resampling delay, the signal at the frequency fs, here 32 kHz (derived from the resampling), is delayed. This resampling is problematic because it enlarges the gap to be filled between the LPD and FD modes in the transition frame. It therefore lacks samples to correctly implement the cross-fade between the LPD signal resampled at the frequency fs and the FD decoded signal. The last input samples at 12 800 or 16 000 Hz are, however, stored in the resampling step of the block 830. These stored samples correspond temporally to the missing samples at 32 kHz (dark gray zone in
(49) The interpolation according to the invention is used in this embodiment to resample the signal contained in the memory of the resampling filter (mem) in order to prolong the signal derived from the simplified LPD coding (block 816) at the start of the transition frame and thus obtain, at 32 kHz, the missing samples to be able to make the cross-fade between the LPD synthesis and the FD synthesis.
(50) The decoder illustrated in
(51) Depending on the frame received and demultiplexed (block 1001), the output is switched (1004) between the output of a temporal decoder (LPD DEC) of CELP type (1002) using a linear prediction and a frequency decoder (FD DEC, 1003). It will be noted that the output of the LPD decoder is resampled from the internal frequency 12.8 or 16 kHz to the output frequency fs by a resampling module 1005, for example of FIR type.
(52) Here, the same principle is applied again of prolonging the signal by performing a simplified restricted LPD decoding (block 1006) as described in the application EP 2656343 to fill this missing signal (zone denoted TR) in the transition frame of FD type which follows an LPD frame.
(53) In the decoder illustrated here in
(54) It is assumed here that the resampling from 12.8 or 16 kHz to fs of the resampling block 1007 is performed by polyphase FIR filtering with a filter memory (called mem). This memory stores the last samples of the preceding frame of the signal decoded by LPD or TR mode at the frequency 12.8 or 16 kHz. The length of this memory corresponds to the FIR filtering delay. Because of this resampling delay, the signal at the frequency fs, here 32 kHz (derived from the resampling) is delayed. This resampling is problematic because it enlarges the gap to be filled between the LPD and FD modes in the transition frame. It therefore samples lacks correctly implement to the cross-fade between the LPD signal resampled at the frequency fs and the FD decoded signal. The last input samples at 12 800 or 16 000 Hz are, however, stored in the resampling step of the block 1007. These stored samples correspond temporally to the missing samples at 32 kHz (dark gray zone in
(55) The interpolation according to the invention is used in this embodiment to resample the signal contained in the memory of the resampling filter (mem) in order to prolong the signal derived from the simplified restricted LPD decoding (block 1006) at the start of the transition frame and thus obtain, at 32 kHz, the missing samples to be able to make the cross-fade between the LPD synthesis and the FD synthesis.
(56) To resample the signal (mem) contained in the memory of the resampling filter 1007, the resampling device 800 according to the invention performs an interpolation of order higher than one and comprises a module 801 for computing possible interpolation values for a plurality of intervals covering the temporal location of the sample to be interpolated. These possible interpolation values are computed, for example, as described with reference to
(57) The resampling device also comprises a module 802 for obtaining samples to be interpolated by computation of a weighted average of the possible interpolation values derived from the computation module 801.
(58) The duly resampled signal can be combined in 1008 with the signal derived from the FD coding of the module 1003 via a cross-fade as described in the patent application EP 2656343.
(59) It must also be noted that, with the interpolation proposed according to the invention, it is not possible to cover the entire time domain of the filter memory (mem), as is illustrated in
(60) In the preferred embodiment, the extreme right interval of the last cubic is used for the interpolation between the last 2 input samples (empty black triangles) and the last interpolated sample is repeated for the extrapolated samples (triangles 903).
(61)
(62) This type of device comprises a processor PROC cooperating with a memory block BM comprising a storage and/or working memory MEM.
(63) Such a device comprises an input module E capable of receiving audio signal frames x.sub.In at a sampling frequency f.sub.In. These audio signal frames are, for example, a signal contained in a memory of a resampling filter.
(64) It comprises an output module S capable of transmitting the resampled audio frequency signal x.sub.Out at the sampling frequency of f.sub.Out.
(65) The memory block can advantageously comprise a computer program comprising code instructions for implementing the steps of the resampling method within the meaning of the invention, when these instructions are executed by the processor PROC, and notably for obtaining samples interpolated by a computation of a weighted average of possible interpolation values, computed over a plurality of intervals covering the temporal location of the sample to be interpolated.
(66) Typically, the description of
(67) The memory MEM stores, generally, all the data necessary for the implementation of the method.
(68) Although the present disclosure has been described with reference to one or more examples, workers skilled in the art will recognize that changes may be made in form and detail without departing from the scope of the disclosure and/or the appended claims.