Frame loss correction by weighted noise injection
09761230 · 2017-09-12
Assignee
Inventors
Cpc classification
G10L19/12
PHYSICS
G10L19/005
PHYSICS
International classification
G10L21/00
PHYSICS
G10L19/12
PHYSICS
Abstract
A method for processing a digital signal, implemented during decoding of the signal, in order to replace a succession of samples lost during decoding, the method comprising steps of: generating a structure of a signal for replacing the lost succession, this structure comprising spectral components determined from valid samples received during decoding before the succession of lost samples; generating a residue between a digital signal available to the decoder, comprising received valid samples, and a signal generated from the spectral components; and extracting blocks from the residue, method in which window weighted blocks are injected into the structure using an overlap-add approach, the injected blocks partially overlapping in time.
Claims
1. A method for processing a digital audio signal, implemented during decoding of said signal, in order to replace a succession of samples lost during decoding, the method comprising the steps, by a processor of a telecommunication terminal, of: generating a structure of a signal for replacing the lost succession, said structure comprising spectral components determined from valid samples received during decoding and prior to said succession of lost samples, generating a residue between a digital signal available to the decoder, comprising valid samples received, and a signal generated from said spectral components, extracting blocks from said residue, wherein said blocks are injected into said structure by using an overlap-add approach according to weighting windows, said injected blocks at least partially overlapping in time, wherein said blocks are injected with a parameter that is variable between at least two injected blocks, the variable parameter being one of: a write start time of the injected block, and an overlap rate between two successive injected blocks, wherein the variable parameter varies pseudo-randomly for at least one injected block.
2. The method according to claim 1, wherein, as said blocks are defined by an extracted block start time and a block duration, at least one parameter among said extracted block start time and said block duration is variable between at least two extracted blocks.
3. The method according to claim 1, wherein, said blocks being defined by an extracted block start time and a block duration, at least one parameter among said extracted block start time and said block duration is determined pseudo-randomly for at least one extracted block.
4. The method according to claim 1, wherein the sum of the weighting windows applied to two successive injected blocks is equal to one for the overlap segment between these two blocks.
5. The method according to claim 1, wherein the sum of the squares of the weighting windows, applied to two successive injected blocks, is equal to one for the overlap segment between these two blocks.
6. The method according to claim 1, wherein the sign of at least one injected block is changed.
7. The method according to claim 1, wherein at least one injected block is time-reversed.
8. The method according to claim 1, wherein said blocks are first injected into an intermediate noise signal, said intermediate noise signal being subsequently injected into said structure.
9. The method according to claim 1, wherein said blocks are injected into said structure in real time.
10. A non-transitory computer-readable storage medium with an executable program stored thereon, wherein the program instructs a microprocessor to perform the method according to claim 1.
11. A device for decoding a digital audio signal comprising a succession of samples divided into successive frames, the device comprising means for replacing at least one succession of lost samples, comprising at least a processor adapted to perform the following steps: generating a structure of a signal for replacing the lost succession, said structure comprising spectral components determined from valid samples received during decoding and prior to said succession of lost samples, generating a residue between a digital signal available to the decoder, comprising valid samples received, and a signal generated from said spectral components, extracting blocks from said residue, injecting said blocks into said structure, wherein the injection makes use of window-weighted blocks in an overlap-add approach, said injected blocks at least partially overlapping in time, wherein said blocks are injected with a parameter that is variable between at least two injected blocks, the variable parameter being one of: a write start time of the injected block, and an overlap rate between two successive injected blocks, wherein the variable parameter varies pseudo-randomly for at least one injected block.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Other features and advantages of the invention will become apparent upon reading the following detailed description of some embodiments of the invention and upon reviewing the drawings in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION
(17) We will now refer to
(18) The term “frame” is understood to mean a block of at least one sample. In most codecs, these frames consist of several samples. However, in some codecs, such as PCM (Pulse Code Modulation), for example according to Recommendation G.711, the signal simply consists of a succession of samples (a “frame” in the meaning of the invention then containing only one sample). The invention can then also be applied to this type of codec.
(19) For example, the valid signal can consist of the last valid frames received before the frame loss. It is also possible to use one or several subsequent valid frames received after the lost frame (although such an embodiment results in a delay in decoding). The samples used from the valid signal may be those of the frames directly, and possibly those which correspond to the memory of the transform and which typically contain aliasing in the case of transform decoding with MDCT or MLT overlapping.
(20) In a first step S1 of the processing of
(21) In the filtering step S2, the audio buffer b(n) is then separated into two frequency bands, a low frequency band BB and a high frequency band BH with a separation frequency denoted below as Fc, with for example Fc=4 kHz.
(22) Step S3, applied to the low frequency band, consists of then searching for a loopback point and a segment of length P corresponding to the fundamental period in the buffer b(n) resampled with frequency Fc. The fundamental period corresponds for example to a pitch period in the case of a voiced speech signal (the inverse of the fundamental frequency of the signal). However, the signal may also originate from a music signal for example, having an overall tone which is associated with a fundamental frequency and a fundamental period that can correspond to said repetition period.
(23) In what follows, it is assumed that only one fundamental period of length P is used for synthesis of the signal, but it should be noted that the principle of the processing applies equally well for a segment extending over several fundamental periods. The results are even better with several fundamental periods, in terms of accuracy of the FFT and the wealth of spectral components obtained.
(24) The next step S4 consists of breaking segment p(n) down into a sum of sines.
(25) In step S5 of
(26) The next step S6 is a sinusoidal synthesis. In one exemplary embodiment, it consists of generating a segment s(n) of a length at least equal to the size of a lost frame (T). In one particular embodiment, a length equal to 2 frames (for example 40 ms) is generated so as to be able to do a crossfade type of audio mixing (as a transition) between the synthesized signal (with frame loss correction) and the signal decoded in the next valid frame when such a frame is once again correctly received.
(27) To anticipate the resampling of the frame (length of samples denoted LF), the number of samples to be synthesized can be increased by half the size of the resampling filter (LF). The synthesized signal s(n) is calculated as a sum of the selected sinusoidal components:
(28)
where k is the index of the K components selected in step S5. There are several possible conventional methods for performing this sinusoidal synthesis.
(29) Step S7 of
(30) One simple embodiment of the invention can already be described with reference to
(31) This residue is transformed in step P6 so that it reaches a size
(32)
to become signal b(n) in step P7.
(33) Signal b(n) is then injected, in step P8, into signal s(n) generated in step P2, for a duration N corresponding to the duration of the signal to be replaced.
(34) This replacement signal f(n) is then mixed with the valid signal in step P9. The mixing may for example include overlap-adding RECOV over an overlap interval RO.
(35) In one embodiment, this residual signal is replicated one or more times (depending on the portion of time to be filled), with overlap-add between replicas.
(36) In another embodiment, various transforms may be applied to the blocks of the residual signal in a pseudo-random manner at each replication: it is thus possible to reverse the sign of the signal, and/or perform a time reversal.
(37) We will now describe, with reference to
(38) In step S601, a signal s(n) is generated from the sinusoidal synthesis of step S6 (also referenced in
(39) The residue r(n) is obtained by subtracting SUB signal s(n) from signal p(n). This yields, in step S603, r(n) such that r(n)=p(n)−s(n).
(40) In step S604, a counter variable k is initialized to 0 and signal b(n,k) is initialized such that b(n,0)=0.
(41) In step S605, a block r(n,k) is extracted from signal r(n). In one embodiment, the temporal characteristics (start time of block i.sub.k and duration of block L.sub.k) of this extraction are determined pseudo-randomly. In another embodiment, conditions may be imposed for this extraction. For example, the sum of the value of the block start time and the value of the duration must be less than the value of the duration corresponding to that of block p(n) extracted in step S602.
(42) In step S606, the duration L.sub.k of the extracted block r(n,k) is transmitted for a window configuration step S608.
(43) In step S607, a set of weighting windows is made available so that a weighting window can be configured in step S608. For example, weighting windows stored in memory are extracted and transferred to a working memory.
(44) In step S608, a weighting window is selected and configured so that it can be multiplied by block r(n,k) in step MULT. The parameters of the window include the duration L.sub.k appropriate for block r(n,k).
(45) Block w.sub.k.Math.r(n,k) is then added with overlapping to signal b(n,k−1), corresponding to the (k−1) blocks already added, such that b(n,k)=w.sub.k.Math.r(n,k)+b(n,k−1). In one embodiment, the overlap-adding is performed with a fixed overlap rate of 50%.
(46) Test T609 verifies that the length of the signal b(n,k) already generated is not greater than the value N corresponding to the duration of the signal to be replaced.
(47) If it is, signal b(n,k) is truncated so that the temporal length of b(n,k) is equal to the value N corresponding to the duration of the signal to be replaced in step S612, the truncated value being denoted TQ. In step S613, the noise signal Y to be injected into the replacement signal for the lost frames is set to TQ and is injected in step S7 (also referenced in
(48) If it is not, the value of b(n,k) is stored in a working memory MEM (with reference to
(49) We will now describe, with reference to
(50) In this embodiment, the residual signal is injected in successive iterations (numbered k) of overlay-adding signal blocks r.sub.k′(n) obtained from the residue r(n).
(51) At iteration k, the block read is determined by a block start index i.sub.k and a block length L.sub.k, and the manner of injecting this residue portion into the target time slot is defined by determining an optional transformation T.sub.k, a write index j.sub.k (start of copying the block in the time slot to be filled), and overlap-add window w.sub.k(n).
(52) We will denote the complementary signal as b(n), of size N samples, to be generated from the residue. The procedure for generating the noise signal is described as follows.
(53) Initialization:
(54) b(n)=0, 0≦n<N k=0 j.sub.0=0
(55) Iterations, until j.sub.k+L.sub.k=N: 1) choice of i.sub.k and L.sub.k such that i.sub.k+L.sub.k≦P and j.sub.k+L.sub.k≦N, and extraction of block P(k), 2) choice of a transformation T.sub.k to obtain S(k) corresponding to r.sub.k′(n)=T.sub.k(r.sub.k(i.sub.k+n)). This transformation is described below, 3) if j.sub.k+L.sub.k<N, in order to prepare the overlap with the next iteration, choice of j.sub.k+1≦j.sub.k+L.sub.k (and preferably j.sub.k+1≧j.sub.k−1+L.sub.k−1 to limit the simultaneous overlap to two blocks at most, for example S(k) and S(k+1)), and extraction of block P(k+1), 4) determination of the weighting window w.sub.k(n) based on any overlaps with neighboring blocks, 5) pasting of r.sub.k′(n) weighted by window w.sub.k(n): b(j.sub.k+n)=b(j.sub.k+n)+r.sub.k′(n).Math.w.sub.k(n), 0≦n≦L.sub.k, and 6) incrementation of k=k+1.
(56) In this embodiment, the described procedure increases write index j.sub.k. Any other choice of progression (decreasing, non-monotonic, etc.) is also possible.
(57) In another embodiment, L.sub.k is chosen to be relatively large compared to the available reserve P, in order to be able to progress significantly in copying, and to avoid distorting relatively low frequency components. For example, referring to
(58) In another embodiment, the size j.sub.k+L.sub.k−j.sub.k+1 of the overlap areas is reduced to limit the number of addition and multiplication operations required. Adjustment of the overlap rate (corresponding to the size j.sub.k+L.sub.k−j.sub.k+1 of the overlap areas) can also be configured so that the ratio between quality (erasing artifacts) and the processing cost are adapted to the planned use of the decoder.
(59) In one preferred embodiment, with reference to
(60) In the overlapping area, meaning for nε[0; l.sub.k [ where l.sub.k=j.sub.k+L.sub.k−j.sub.k+1, the resulting signal is:
b(j.sub.k+1+n)=r.sub.k′(j.sub.k+1−j.sub.k+n).Math.w.sub.k(j.sub.k+1−j.sub.k+n)+r.sub.k+1′(n).Math.w.sub.k+1(n)
(61) In one embodiment, the end of w.sub.k and the start of w.sub.(k+1) are combined according to a criterion called “preservation of amplitude”:
w.sub.k(j.sub.k+1−j.sub.k+n)+w.sub.k+1(n)=1
(62) It is thus sufficient to choose a crossfade function ƒ.sub.l.sub.
w.sub.k(j.sub.k+1−j.sub.k+n)=ƒ.sub.out(n)=1−ƒ.sub.t.sub.
w.sub.k+1(n)=ƒ.sub.in(n)=ƒ.sub.l.sub.
(63) For example, the crossfade function can be refined and defined by:
(64)
(65) In another example, represented by function ƒ.sub.in(n) in
(66)
(67) In another embodiment, a criterion called “energy conservation” is selected, where the pasted signals can be combined without phase coherence, and defined by:
(w.sub.k(j.sub.k+1−j.sub.k+n)).sup.2+(w.sub.k+1(n)).sup.2=1
(68) From a crossfade function ƒ.sub.k(n) as proposed above, one can then deduce for nε[0; l.sub.k [:
w.sub.k(j.sub.k+1−j.sub.k+n)=ƒ.sub.out(n)=√{square root over (1−ƒ.sub.l.sub.
w.sub.k+1(n)=ƒ.sub.in(n)=√{square root over (ƒ.sub.l.sub.
(69) Each weighting window is typically composed of three parts, from left to right: an increasing part (complementary to the decreasing part of the previous window), a constant and conservative part (gain of 1), and a decreasing part.
(70) In one embodiment, at least one of these parts is of zero length for at least one weighting window. For example, the weighting window applied to the first injected block consists only of a decreasing part if this first block is completely overlapped by the beginning of the next injected block.
(71) In another embodiment, the crossfade effect for two blocks is managed simultaneously over their overlapping area. This involves simply breaking apart the steps described above and reassembling them differently.
(72) Each iteration then consists of: a phase of pasting without overlap and thus without windowing (eliminating the multiplication by w.sub.k(n)=1), and/or a phase of crossfade pasting of the end of the old block and the beginning of the new block, using the crossfade functions ƒ.sub.out(n) and ƒ.sub.in(n) described above.
(73) This is described in more detail with the following procedure, referred to as “with simultaneous crossfade.”
(74) Initialization:
(75) b(n)=0, 0≦n<N k=0 j.sub.0=0 l.sub.−1=0 Choice of i.sub.0 and L.sub.0 such that i.sub.0+L.sub.0≦P and j.sub.0+L.sub.0≦N Choice of j.sub.1≧j.sub.0 where j.sub.1≦j.sub.0+L.sub.0, from which the size of the overlap is deduced l.sub.0=j.sub.0+L.sub.0−j.sub.1 Choice of transformations T.sub.0 and T.sub.1 Calculation of r′.sub.0=T.sub.0(r.sub.0(i.sub.0+n))
(76) Iterations, until j.sub.k+L.sub.k=N: 1) If j.sub.k+1>j.sub.k+l.sub.k−1, pasting without overlap or windowing:
b(j.sub.k+n)=r.sub.k′(n),l.sub.k−1≦n<L.sub.k−l.sub.k 2) Crossfade pasting in the overlap area:
b(j.sub.k+1+n)=r.sub.k′(L.sub.k−l.sub.k+n).Math.ƒ.sub.out(n)+r.sub.k+1′(n).Math.ƒ.sub.in(n),0≦n<l.sub.k 3) If another iteration is required (particularly if j.sub.k+L.sub.k<N), a) choice of j.sub.k+1≦j.sub.k+L.sub.k where j.sub.k+1≧j.sub.k−1+L.sub.k−1 (to limit simultaneous overlap to two blocks at most) b) Choice of i.sub.k+1 and L.sub.k+1 such that i.sub.k+1+L.sub.k+1≦P and j.sub.k+1+L.sub.k+1≦N c) Choice of transformation T.sub.k+1 to obtain r.sub.k+1′(n)=T.sub.k+1(r.sub.k+1(i.sub.k+1+n)) (see details below) 4) Incrementation of k=k+1
(77) In a variant, the principle of crossfading is applied between the new pasted block and the signal already generated in the overlapping portion: b(j.sub.k+1+n)=b(j.sub.k+1 n)ƒ.sub.out(n)+r′.sub.k+1(n).Math.ƒ.sub.in(n). This embodiment has the advantage of managing simultaneous overlaps of more than two blocks without increasing the complexity of the calculations.
(78) Thus, at least one of the parameters i.sub.k, l.sub.k, L.sub.k and T.sub.k varies from one iteration to another, in order to avoid a periodicity effect and the associated auditory artifacts (metallic, artificial sound).
(79) One can deduce the indices i.sub.k, i.sub.k+1, j.sub.k and j.sub.k+1 delay information d.sub.k,k+1 of one pasted block relative to another, in the filled time slot: d.sub.k,k+1=(j.sub.k+1−i.sub.k+1)−(j.sub.k−i.sub.k).
(80) In a preferred but non-limiting manner, d.sub.k,k+1 is set so that it is different from one iteration k to the next k+1.
(81) In one embodiment, to improve the erasing of artifacts, simple or complex transformations (denoted T.sub.k above) can be introduced in a variable manner during iterations, offering the advantage of introducing a form of decorrelation between injected signal portions.
(82) One possible and simple transformation T.sub.k consists of changing the sign of the signal: r.sub.k′(n)=T.sub.k(r.sub.k(i.sub.k+n))=σ.sub.kr.sub.k(i.sub.k+n) where σ.sub.k=±1 depending on the iteration.
(83) One possible transformation, which can be combined with the previous one and is applicable pseudo-randomly, consists of a time reversal, meaning the reading or writing of the residue in a retrograde manner:
r.sub.k′(n)=T.sub.k(r.sub.k(i.sub.k+n))=σ.sub.kr.sub.k(i.sub.k+L.sub.k−1−n),0≦n<L.sub.k
(84) Other transformations which are more complex in their computation cost are also possible, for example phase-shifting filters. A phase-shifting filter, also called an all-pass filter, presents an identical gain over the entire frequency range used, but the relative phase of the frequencies making up the signal varies with the frequency.
(85) Although an intermediate variable r.sub.k′(n) is introduced here to facilitate the description, the transformation T.sub.k in question can be done as a particular mode for reading digital samples without necessarily requiring intermediate storage in a buffer between reading from r(n) and writing to b(n).
(86) In another embodiment, the k.sup.th signal portion injected can be obtained from the complementary signal already generated b(n), 0≦n<j.sub.k−1+L.sub.k−1, and no longer only from the residue r(n).
(87) One variant embodiment comprising the procedure “with simultaneous crossfade” described above, incorporated into a digital audio decoder, is now given as an example with reference to
(88) Initialization:
(89) j.sub.1=j.sub.0=0: the crossfade of two blocks is applied the moment filling starts i.sub.0=P/2 L.sub.0=P/2
(90) In each iteration The read index i.sub.k (for k>0) points to the start of the calculated residue segment r(n): i.sub.k=0. The crossfade functions are sinusoidal:
ƒ.sub.out(n)=1−ƒ.sub.l.sub.
ƒ.sub.in(n)=ƒ.sub.l.sub.
with
(91)
l.sub.k=└α(k′).Math.P/2┘
with k′=mod (k+cnt_bfi) where cnt.sub.bfi is the counter for the number of missing frames and α=[1 0.8 0.6 0.9]. The transformation T.sub.k essentially consists of an occasional change of sign (no time reversal), indicated by the coefficient
(92)
(93) The first steps of the method described above are presented in the following table, with reference to
(94) TABLE-US-00001 INIT j.sub.2 = j.sub.0 = 0; i.sub.0 = P/2; L.sub.0 = P/2; l.sub.0 = P/2; calculate r′.sub.0(n) by applying T.sub.0(σ.sub.0 = 1) ST(0) for k = 0, choose: i.sub.2 = 0; l.sub.2 = 0.8 × P/2; L.sub.2 = l.sub.2+l.sub.0 calculate r′.sub.1(n) by applying T.sub.1 (σ.sub.2 = −1) calculate f.sub.out(n) & f.sub.in(n) b(j.sub.1 + n) = r′.sub.0(n)*f.sub.out(n) + r′.sub.2(n)*f.sub.in(n) j.sub.2 = j.sub.1 + l.sub.0 ST(1) for k = 1, choose: i.sub.2 = 0; l.sub.2 = 0.6 × P/2; L.sub.2 = l.sub.2+l.sub.1 calculate r′.sub.2(n) by applying T.sub.2 (σ.sub.2 = 1) calculate f.sub.out(n) & f.sub.in(n) b(j.sub.2 + n) = r′.sub.1(L.sub.1 − l.sub.1 + n)*f.sub.out(n) + r′.sub.2(n)*f.sub.in(n) j.sub.3 = j.sub.2 + l.sub.1 ST(2) for k = 2, choose: i.sub.3 = 0; l.sub.2 = 0.9 × P/2; L.sub.3 = l.sub.3+l.sub.2 calculate r′.sub.3(n) by applying T.sub.3 (σ.sub.3 = −1) calculate f.sub.out(n) & f.sub.in(n) b(j.sub.3 + n) = r′.sub.2(L.sub.2 − l.sub.2 + n)*f.sub.out(n) + r′.sub.3(n)*f.sub.in(n) j.sub.4 = j.sub.3 + l.sub.2
(95) Once the complementary signal b(n) is generated for the desired time portion, it is added to the signal generated by sinusoidal synthesis s(n), n>0.
(96) In a preferred embodiment, at least one of the parameters of the blocks is determined pseudo-randomly in order to introduce inconsistencies into the replacement signal and thus limit the periodicity phenomenon which causes auditory unpleasantness. The parameters of the weighting windows are, for example, the extracted block start time, the duration of a block (similar to parameter L.sub.k described above), and the overlap rate of two consecutive blocks.
(97) In one exemplary embodiment, with reference to
(98) For example, for 10 frames of lost data to be replaced, the noise signal is weighted by 20 weighting windows.
(99) As stated above, the term pseudo-random is used in mathematics and computer science to designate a sequence of numbers that approximates statistically perfect randomness. By virtue of the algorithmic processes used to generate it and the sources employed, the sequence cannot be considered as completely random. Of course, the parameters can be generated pseudo-randomly but still meet certain conditions, for example conditions relating to the length of the signal to be replaced.
(100) In another embodiment, with reference to
(101) In another embodiment, with reference to
(102) In another embodiment, with reference to
(103) Next, returning to step S8 of
(104) In step S9, the signal is synthesized by resampling the low frequency band at its original frequency Fc in step S70, and adding it to the signal coming from the repetition of step S8 in the high frequency band.
(105) In step S10, an overlap-add is performed which ensures continuity between the signal before the frame loss and the synthesized signal, and with the synthesized signal and the signal after the frame loss.
(106) Of course, the invention is not limited to the embodiment described above; it extends to other variants.
(107) For example, the separation into high and low frequency bands in step S2 is optional. In an alternative embodiment, the signal from the buffer (step S1) is not separated into two sub-bands and steps S3 to S10 remain identical to those described above. However, the processing of spectral components in the low frequencies advantageously allows limiting the complexity.
(108) The invention may be implemented in a conversational decoder, in the case of frame loss. Physically, it can be implemented in a circuit for decoding, typically in a telephony terminal. To this end, such a circuit CIR may comprise or be connected to a processor PROC, as illustrated in
(109) More particularly, an embodiment has been described above that is based on a method for generating noise from a residue between a known signal and a synthesized signal. Of course, it is also possible to calculate the residue in the frequency domain (eliminating the selected spectral components from the original spectrum) and to obtain background noise by reverse transform.
(110) An embodiment has been described above that is based on a structure comprising spectral components determined from valid samples received during decoding and before the succession of lost samples. Of course, these spectral components may also be determined from samples received after this succession of lost samples. These spectral components may also be determined from samples received prior and subsequent to this succession of lost samples. These spectral components may also be constant.