Audio frame loss concealment
11482232 · 2022-10-25
Assignee
Inventors
Cpc classification
G10L19/02
PHYSICS
International classification
G10L19/005
PHYSICS
G10L19/02
PHYSICS
Abstract
Concealing a lost audio frame of a received audio signal is provided by performing a sinusoidal analysis (81) of a part of a previously received or reconstructed audio signal, wherein the sinusoidal analysis involves identifying frequencies of sinusoidal components of the audio signal, applying a sinusoidal model on a segment of the previously received or reconstructed audio signal, wherein said segment is used as a prototype frame in order to create a substitution frame for a lost audio frame, and creating the substitution frame (83) for the lost audio frame by time-evolving sinusoidal components of the prototype frame, up to the time instance of the lost audio frame, in response to the corresponding identified frequencies.
Claims
1. A frame loss concealment method, wherein a segment from a previously received or reconstructed audio signal is used as a prototype frame in creating a substitution frame for a lost audio frame, the method comprising: transforming the prototype frame into a frequency domain; applying a sinusoidal model to the prototype frame in the frequency domain to identify a frequency of at least one sinusoidal component of the prototype frame, wherein identifying of the frequency of the at least one sinusoidal component is performed with higher resolution than a frequency resolution used in transforming the prototype frame and further involves interpolation; determining a phase shift θ.sub.k for the at least one sinusoidal component; shifting a phase of all spectral coefficients in the prototype frame included in an interval M.sub.k around a sinusoid k by the phase shift θ.sub.k while retaining a magnitude of the spectral coefficients in the prototype frame included in the interval M.sub.k around the sinusoid k, wherein phases of spectral coefficients that are not phase shifted are randomized and the spectral coefficients that are not phase shifted include spectral coefficients in a gap between two M.sub.k intervals wherein intervals k=1 . . . K of M.sub.k are non-overlapping; and creating the substitution frame by performing an inverse frequency transform of a frequency spectrum of the prototype frame after phase shifting the spectral coefficients in the prototype frame.
2. The frame loss concealment method according to claim 1, wherein the phase shift θ.sub.k depends on a sinusoidal frequency f.sub.k and a time shift between the prototype frame and the lost audio frame.
3. The frame loss concealment method according to claim 1, wherein at least one of transforming, applying, calculating, phase shifting, and/or creating is performed by a processor, the method further comprising: providing by the processor an audio signal for speaker playback, wherein the audio signal is provided using the substitution frame.
4. The frame loss concealment method according to claim 1 further comprising: using the substitution frame in place of the lost audio frame to reduce audible impact of the lost audio frame.
5. The frame loss concealment method according to claim 1 further comprising: providing a decoded and reconstructed audio signal for speaker playback, wherein the decoded and reconstructed audio signal is provided using the substitution frame and the previously received or reconstructed audio signal; and transmitting the decoded and reconstructed audio signal through output circuitry towards a speaker for the speaker playback.
6. The frame loss concealment method according to claim 1 further comprising: playing the substitution frame that is created through a loudspeaker device.
7. The frame loss concealment method according to claim 1 further comprising: receiving the segment from the previously received or reconstructed audio signal through an input circuit, and output the substitution frame through an output circuit toward an electronic device have a loudspeaker for playback through the loudspeaker.
8. The frame loss concealment method according to claim 1 further comprising: replace the lost audio frame with the substitution frame in the previously received or reconstructed audio signal; and output the substitution frame and the previously received or reconstructed audio signal towards storage.
9. The frame loss concealment method according to claim 1, wherein identifying of the frequency of the at least one sinusoidal component further involves identifying frequencies in a vicinity of peaks of a spectrum related to a frequency domain transform used to transform the prototype frame.
10. An apparatus for creating a substitution frame for a lost audio frame, the apparatus comprising: a processor; and memory communicatively coupled to the processor, said memory comprising instructions executable by the processor, which cause the processor to: generate a prototype frame from a segment of a previously received or reconstructed audio signal; transform the prototype frame into a frequency domain; apply a sinusoidal model to the prototype frame in the frequency domain to identify a frequency of at least one sinusoidal component of the previously received or reconstructed audio signal, wherein identifying of the frequency of the at least one sinusoidal component is performed with higher resolution than a frequency resolution used in transforming the prototype frame and further involves interpolation; determine a phase shift θ.sub.k for the at least one sinusoidal component; shift a phase of all spectral coefficients in the prototype frame included in an interval M.sub.k around a sinusoid k by the phase shift θ.sub.k while retaining a magnitude of the spectral coefficients in the prototype frame included in the interval M.sub.k around the sinusoid k, wherein phases of spectral coefficients that are not phase shifted are randomized and the spectral coefficients that are not phase shifted include spectral coefficients in a gap between two M.sub.k intervals wherein intervals k=1 . . . K of M.sub.k are non-overlapping; and create the substitution frame by performing an inverse frequency transform of a frequency spectrum of the prototype frame after phase shifting the spectral coefficients in the prototype frame.
11. The apparatus according to claim 10, wherein the phase shift θ.sub.k depends on a sinusoidal frequency f.sub.k and a time shift between the prototype frame and the lost audio frame.
12. The apparatus according to claim 10, further comprising: a loudspeaker, wherein the instructions comprise further instructions to play the substitution frame that is created through the loudspeaker.
13. The apparatus according to claim 10, further comprising: an input circuit; and an output circuit, wherein the processor is operated to receive the segment from the previously received or reconstructed audio signal through the input circuit, and to output the substitution frame through the output circuit toward a device having a loudspeaker for playback through the loudspeaker.
14. An audio decoder comprising the apparatus according to claim 10.
15. The frame loss concealment method according to claim 1 wherein applying the sinusoidal model to the prototype frame in the frequency domain to identify a frequency of at least one sinusoidal component of the previously received or reconstructed audio signal comprises applying the sinusoidal model to the prototype frame in the frequency domain to identify a frequency of at least one sinusoidal component of the previously received or reconstructed audio signal via parabolic interpolation.
16. The apparatus according to claim 10 wherein to apply the sinusoidal model to the prototype frame in the frequency domain to identify a frequency of at least one sinusoidal component of the previously received or reconstructed audio signal, the memory comprises instructions executable by the processor, which cause the processor to apply the sinusoidal model to the prototype frame in the frequency domain to identify a frequency of at least one sinusoidal component of the previously received or reconstructed audio signal via parabolic interpolation.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The embodiments will be described in more detail and with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
DETAILED DESCRIPTION
(12) In the following, embodiments of the invention will be described in more detail. For the purpose of explanation and not limitation, specific details are disclosed, such as particular scenarios and techniques, in order to provide a thorough understanding.
(13) Moreover, it is apparent that the exemplary method and devices described below may be implemented, at least partly, by the use of software functioning in conjunction with a programmed microprocessor or general purpose computer, and/or using an application specific integrated circuit (ASIC). Further, the embodiments may also, at least partly, be implemented as a computer program product or in a system comprising a computer processor and a memory coupled to the processor, wherein the memory is encoded with one or more programs that may perform the functions disclosed herein.
(14) A concept of the embodiments described hereinafter comprises a concealment of a lost audio frame by: Performing a sinusoidal analysis of at least part of a previously received or reconstructed audio signal, wherein the sinusoidal analysis involves identifying frequencies of sinusoidal components of the audio signal; applying a sinusoidal model on a segment of the previously received or reconstructed audio signal, wherein said segment is used as a prototype frame in order to create a substitution frame for a lost frame, and creating the substitution frame involving time-evolution of sinusoidal components of the prototype frame, up to the time instance of the lost audio frame, in response to the corresponding identified frequencies.
(15) Sinusoidal Analysis
(16) The frame loss concealment according to embodiments involves a sinusoidal analysis of a part of a previously received or reconstructed audio signal. The purpose of this sinusoidal analysis is to find the frequencies of the main sinusoidal components, i.e. sinusoids, of that signal. Hereby, the underlying assumption is that the audio signal was generated by a sinusoidal model and that it is composed of a limited number of individual sinusoids, i.e. that it is a multi-sine signal of the following type:
(17)
(18) In this equation K is the number of sinusoids that the signal is assumed to consist of. For each of the sinusoids with index k=1 . . . K, a.sub.k is the amplitude, f.sub.k is the frequency, and φ.sub.k is the phase. The sampling frequency is denominated by f and the time index of the time discrete signal samples s(n) by n.
(19) It is important to find as exact frequencies of the sinusoids as possible. While an ideal sinusoidal signal would have a line spectrum with line frequencies f.sub.k, finding their true values would in principle require infinite measurement time. Hence, it is in practice difficult to find these frequencies, since they can only be estimated based on a short measurement period, which corresponds to the signal segment used for the sinusoidal analysis according to embodiments described herein; this signal segment is hereinafter referred to as an analysis frame. Another difficulty is that the signal may in practice be time-variant, meaning that the parameters of the above equation vary over time. Hence, on the one hand it is desirable to use a long analysis frame making the measurement more accurate; on the other hand a short measurement period would be needed in order to better cope with possible signal variations. A good trade-off is to use an analysis frame length in the order of e.g. 20-40 ms.
(20) According to a preferred embodiment, the frequencies of the sinusoids f.sub.k are identified by a frequency domain analysis of the analysis frame. To this end, the analysis frame is transformed into the frequency domain, e.g. by means of DFT (Discrete Fourier Transform) or DCT (Discrete Cosine Transform), or a similar frequency domain transform. In case a DFT of the analysis frame is used, the spectrum is given by:
(21)
In this equation, w(n) denotes the window function with which the analysis frame of length L is extracted and weighted.
(22)
(23)
(24) The peaks of the magnitude spectrum of the windowed analysis frame |X(m)| constitute an approximation of the required sinusoidal frequencies f.sub.k. The accuracy of this approximation is however limited by the frequency spacing of the DFT. With the DFT with block length L the accuracy is limited to
(25)
(26) However, this level of accuracy may be too low in the scope of the method according the embodiments described herein, and an improved accuracy can be obtained based on the results of the following consideration:
(27) The spectrum of the windowed analysis frame is given by the convolution of the spectrum of the window function with the line spectrum of a sinusoidal model signal S(Ω), subsequently sampled at the grid points of the DFT:
(28)
(29) By using the spectrum expression of the sinusoidal model signal, this can be written as
(30)
(31) Hence, the sampled spectrum is given by
(32)
(33) with m=0 . . . L−1.
(34) Based on this, the observed peaks in the magnitude spectrum of the analysis frame stem from a windowed sinusoidal signal with K sinusoids, where the true sinusoid frequencies are found in the vicinity of the peaks. Thus, the identifying of frequencies of sinusoidal components may further involve identifying frequencies in the vicinity of the peaks of the spectrum related to the used frequency domain transform.
(35) If m.sub.k is assumed to be a DFT index (grid point) of the observed k.sup.th peak, then the corresponding frequency is
(36)
which can be regarded an approximation of the true sinusoidal frequency f.sub.k. The true sinusoid frequency f.sub.k can be assumed to lie within the interval
(37)
(38) For clarity it is noted that the convolution of the spectrum of the window function with the spectrum of the line spectrum of the sinusoidal model signal can be understood as a superposition of frequency-shifted versions of the window function spectrum, whereby the shift frequencies are the frequencies of the sinusoids. This superposition is then sampled at the DFT grid points. The convolution of the spectrum of the window function with the spectrum of the line spectrum of the sinusoidal model signal are illustrated in the
(39) Based on the above discussion, and based on the illustration in
(40) Thus, the identifying of frequencies of sinusoidal components is preferably performed with higher resolution than the frequency resolution of the used frequency domain transform, and the identifying may further involve interpolation.
(41) One exemplary preferred way to find a better approximation of the frequencies f.sub.k of the sinusoids is to apply parabolic interpolation. One approach is to fit parabolas through the grid points of the DFT magnitude spectrum that surround the peaks and to calculate the respective frequencies belonging to the parabola maxima, and an exemplary suitable choice for the order of the parabolas is 2. In more detail, the following procedure may be applied: 1) Identifying the peaks of the DFT of the windowed analysis frame. The peak search will deliver the number of peaks K and the corresponding DFT indexes of the peaks. The peak search can typically be made on the DFT magnitude spectrum or the logarithmic DFT magnitude spectrum. 2) For each peak k (with k=1 . . . K) with corresponding DFT index m.sub.k, fitting a parabola through the three points P.sub.1; P.sub.2; P.sub.3={(m.sub.k−1, log(|X(m.sub.k−1)|); (m.sub.k, log(|X(m.sub.k)|); (m.sub.k+1, log(|X(m.sub.k+1)|)}. This results in parabola coefficients b.sub.k(0), b.sub.k(1), b.sub.k(2) of the parabola defined by
(42)
(43) Applying a Sinusoidal Model
(44) The application of a sinusoidal model in order to perform a frame loss concealment operation according to embodiments may be described as follows:
(45) In case a given segment of the coded signal cannot be reconstructed by the decoder since the corresponding encoded information is not available, i.e. since a frame has been lost, an available part of the signal prior to this segment may be used as prototype frame. If y(n) with n=0 . . . N−1 is the unavailable segment for which a substitution frame z(n) has to be generated, and y(n) with n<0 is the available previously decoded signal, a prototype frame of the available signal of length L and start index n.sub.−1 is extracted with a window function w(n) and transformed into frequency domain, e.g. by means of DFT:
(46)
(47) The window function can be one of the window functions described above in the sinusoidal analysis. Preferably, in order to save numerical complexity, the frequency domain transformed frame should be identical with the one used during sinusoidal analysis.
(48) In a next step the sinusoidal model assumption is applied. According to the sinusoidal model assumption, the DFT of the prototype frame can be written as follows:
(49)
This expression was also used in the analysis part and is described in detail above.
(50) Next, it is realized that the spectrum of the used window function has only a significant contribution in a frequency range close to zero. As illustrated in
(51)
for non-negative m∈M.sub.k and for each k. Herein, M.sub.k denotes the integer interval
(52)
where m.sub.min,k and m.sub.max,k fulfill the above explained constraint such that the intervals are not overlapping. A suitable choice for m.sub.min,k and m.sub.max,k is to set them to a small integer value, e.g. δ=3. If however the DFT indices related to two neighboring sinusoidal frequencies f.sub.k and f.sub.k+1 are less than 2δ, then δ is set to
(53)
such that it is ensured that the intervals are not overlapping. The function floor(⋅) is the closest integer to the function argument that is smaller or equal to it.
(54) The next step according to embodiments is to apply the sinusoidal model according to the above expression and to evolve its K sinusoids in time. The assumption that the time indices of the erased segment compared to the time indices of the prototype frame differs by n.sub.−1 samples means that the phases of the sinusoids advance by
(55)
(56) Hence, the DFT spectrum of the evolved sinusoidal model is given by:
(57)
(58) Applying again the approximation according to which the shifted window function spectra do no overlap gives:
(59)
for non-negative m∈M.sub.k and for each k.
(60) Comparing the DFT of the prototype frame Y.sub.−1(m) with the DFT of evolved sinusoidal model Y.sub.0(m) by using the approximation, it is found that the magnitude spectrum remains unchanged while the phase is shifted by
(61)
for each m∈M.sub.k. Hence, the substitution frame can be calculated by the following expression:
z(n)=IDFT{Z(m)} with Z(m)=Y(m)×e.sup.jθ.sup.
(62) A specific embodiment addresses phase randomization for DFT indices not belonging to any interval M.sub.k. As described above, the intervals M.sub.k, k=1 . . . K have to be set such that they are strictly non-overlapping which is done using some parameter δ which controls the size of the intervals. It may happen that δ is small in relation to the frequency distance of two neighboring sinusoids. Hence, in that case it happens that there is a gap between two intervals. Consequently, for the corresponding DFT indices m no phase shift according to the above expression Z(m)=Y(m).Math.e.sup.jθ.sup.
(63) Based on the above,
(64) In step 81, a sinusoidal analysis of a part of a previously received or reconstructed audio signal is performed, wherein the sinusoidal analysis involves identifying frequencies of sinusoidal components, i.e. sinusoids, of the audio signal. Next, in step 82, a sinusoidal model is applied on a segment of the previously received or reconstructed audio signal, wherein said segment is used as a prototype frame in order to create a substitution frame for a lost audio frame, and in step 83 the substitution frame for the lost audio frame is created, involving time-evolution of sinusoidal components, i.e. sinusoids, of the prototype frame, up to the time instance of the lost audio frame, in response to the corresponding identified frequencies.
(65) According to a further embodiment, it is assumed that the audio signal is composed of a limited number of individual sinusoidal components, and that the sinusoidal analysis is performed in the frequency domain. Further, the identifying of frequencies of sinusoidal components may involve identifying frequencies in the vicinity of the peaks of a spectrum related to the used frequency domain transform.
(66) According to an exemplary embodiment, the identifying of frequencies of sinusoidal components is performed with higher resolution than the resolution of the used frequency domain transform, and the identifying may further involve interpolation, e.g. of parabolic type.
(67) According to an exemplary embodiment, the method comprises extracting a prototype frame from an available previously received or reconstructed signal using a window function, and wherein the extracted prototype frame may be transformed into a frequency domain.
(68) A further embodiment involves an approximation of a spectrum of the window function, such that the spectrum of the substitution frame is composed of strictly non-overlapping portions of the approximated window function spectrum.
(69) According to a further exemplary embodiment, the method comprises time-evolving sinusoidal components of a frequency spectrum of a prototype frame by advancing the phase of the sinusoidal components, in response to the frequency of each sinusoidal component and in response to the time difference between the lost audio frame and the prototype frame, and changing a spectral coefficient of the prototype frame included in an interval M.sub.k in the vicinity of a sinusoid k by a phase shift proportional to the sinusoidal frequency f.sub.k and to the time difference between the lost audio frame and the prototype frame.
(70) A further embodiment comprises changing the phase of a spectral coefficient of the prototype frame not belonging to an identified sinusoid by a random phase, or changing the phase of a spectral coefficient of the prototype frame not included in any of the intervals related to the vicinity of the identified sinusoid by a random value.
(71) An embodiment further involves an inverse frequency domain transform of the frequency spectrum of the prototype frame.
(72) More specifically, the audio frame loss concealment method according to a further embodiment may involve the following steps: 1) Analyzing a segment of the available, previously synthesized signal to obtain the constituent sinusoidal frequencies f.sub.k of a sinusoidal model. 2) Extracting a prototype frame y.sub.−1 from the available previously synthesized signal and calculate the DFT of that frame. 3) Calculating the phase shift θ.sub.k for each sinusoid k in response to the sinusoidal frequency f.sub.k and the time advance n.sub.−1 between the prototype frame and the substitution frame. 4) For each sinusoid k advancing the phase of the prototype frame DFT with θ.sub.k selectively for the DFT indices related to a vicinity around the sinusoid frequency f.sub.k. 5) Calculating the inverse DFT of the spectrum obtained 4).
(73) The embodiments describe above may be further explained by the following assumptions: a) The assumption that the signal can be represented by a limited number of sinusoids. b) The assumption that the substitution frame is sufficiently well represented by these sinusoids evolved in time, in comparison to some earlier time instant. c) The assumption of an approximation of the spectrum of a window function such that the spectrum of the substitution frame can be built up by non-overlapping portions of frequency shifted window function spectra, the shift frequencies being the sinusoid frequencies.
(74)
(75) According to a further embodiment of the decoder, the applied sinusoidal model assumes that the audio signal is composed of a limited number of individual sinusoidal components, and the identifying of frequencies of sinusoidal components of the audio signal may further comprise a parabolic interpolation.
(76) According to a further embodiment, the decoder is configured to extract a prototype frame from an available previously received or reconstructed signal using a window function, and to transform the extracted prototype frame into a frequency domain.
(77) According to a still further embodiment, the decoder is configured to time-evolve sinusoidal components of a frequency spectrum of a prototype frame by advancing the phase of the sinusoidal components, in response to the frequency of each sinusoidal component and in response to the time difference between the lost audio frame and the prototype frame, and to create the substitution frame by performing an inverse frequency transform of the frequency spectrum.
(78) A decoder according to an alternative embodiment is illustrated in
(79) The units and means included in the decoder illustrated in the figures may be implemented at least partly in hardware, and there are numerous variants of circuitry elements that can be used and combined to achieve the functions of the units of the decoder. Such variants are encompassed by the embodiments. A particular example of hardware implementation of the decoder is implementation in digital signal processor (DSP) hardware and integrated circuit technology, including both general-purpose electronic circuitry and application-specific circuitry.
(80) A computer program according to embodiments of the present invention comprises instructions which when run by a processor causes the processor to perform a method according to a method described in connection with
(81) A decoder according to embodiments of this invention may be used e.g. in a receiver for a mobile device, e.g. a mobile phone or a laptop, or in a receiver for a stationary device, e.g. a personal computer.
(82) Advantages of the embodiments described herein are to provide a frame loss concealment method allowing mitigating the audible impact of frame loss in the transmission of audio signals, e.g. of coded speech. A general advantage is to provide a smooth and faithful evolution of the reconstructed signal for a lost frame, wherein the audible impact of frame losses is greatly reduced in comparison to conventional techniques.
(83) It is to be understood that the choice of interacting units or modules, as well as the naming of the units are only for exemplary purpose, and may be configured in a plurality of alternative ways in order to be able to execute the disclosed process actions. It should also be noted that the units or modules described in this disclosure are to be regarded as logical entities and not with necessity as separate physical entities. It will be appreciated that the scope of the technology disclosed herein fully encompasses other embodiments which may become obvious to those skilled in the art, and that the scope of this disclosure is accordingly not to be limited.