Concept for coding mode switching compensation
11600283 · 2023-03-07
Assignee
Inventors
- Martin Dietz (Nuremberg, DE)
- Eleni Fotopoulou (Nuremberg, DE)
- Jérémie Lecomte (Fuerth, DE)
- Markus Multrus (Nuremberg, DE)
- Benjamin Schubert (Nuremberg, DE)
Cpc classification
International classification
Abstract
A codec allowing for switching between different coding modes is improved by, responsive to a switching instance, performing temporal smoothing and/or blending at a respective transition.
Claims
1. Decoder supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the decoder is configured to, responsive to a switching instance, perform temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the decoder is configured to perform the temporal smoothing and/or blending at the transition by, within a temporary phase which directly follows the transition, crosses the transition or precedes the transition, decreasing an information signal's energy during the temporary portion where the information signal is coded using a first audio coding mode and/or increasing the information signal's energy during the temporary portion (80) where the information signal is coded using a second audio coding mode, wherein the first audio coding mode has an higher energy preserving property compared to the second audio coding mode.
2. Decoder according to claim 1, wherein the first audio coding mode is a full-bandwidth audio coding mode and the second audio coding mode is a bandwidth extension (BWE) or sub-bandwidth audio coding mode, or the first audio coding mode is a guided BWE audio coding mode and the second audio coding mode is a blind BWE audio coding mode, or the first and second audio coding modes are full-bandwidth audio coding modes with different signal-energy-preserving properties.
3. Decoder according to claim 1, wherein the high-frequency spectral band overlaps with the effective coded bandwidth of the first and second audio coding mode.
4. Decoder according to claim 1, wherein the high-frequency spectral band overlaps with a spectral BWE extension portion of the second audio coding mode, and a transform spectrum portion or linear-predictively coded spectral portion of the first audio coding mode.
5. Decoder according to claim 1, wherein the decoder is configured to perform the temporal smoothing and/or blending additionally depending on an analysis of the information signal in an analysis spectral band arranged spectrally below the high-frequency spectral band.
6. Decoder according to claim 5, wherein the decoder is configured to determine a measure for an information signal's energy fluctuation in the analysis spectral band and suppress, or set a degree of the temporal smoothing and/or blending dependent on the measure.
7. Decoder according to claim 6, wherein the decoder is configured to compute the measure as the maximum of a first absolute difference between information signal's energies in the analysis spectral band between temporal portions lying at opposite temporal sides of the transition and a second absolute difference between information signal's energies in the analysis spectral band between consecutive temporal portions, both succeeding the transition.
8. Decoder according to claim 5, wherein the analysis spectral band abuts the high-frequency spectral band at a lower spectral side of the high-frequency spectral band.
9. Decoder supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the decoder is configured to, responsive to a switching instance, perform temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the decoder is configured to scale the information signals energy in the high-frequency spectral band in the second temporal portion with a scaling factor which varies between 1 and
10. The decoder supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the decoder is configured to, responsive to a switching instance, perform temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the decoder is configured to perform the switching and/or blending by applying blind BWE onto one of the first and second temporal portions, decoded using a second audio coding mode having an effective coded bandwidth smaller than an effective coded bandwidth of a first audio coding mode using which the other one of the first and second temporal portions is decoded, so as to spectrally extend the effective coded bandwidth of the one of the first and second temporal portions into the high-frequency spectral band and temporally shape the information signal's energy in the high-frequency spectral band in the one of the first and second temporal portions, as spectrally extended, according to a fade-in/out scaling function decreasing from the transition towards farther away from the transition.
11. Decoder supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the decoder is configured to, responsive to a switching instance, perform temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the switching switches from a first audio coding mode to a second audio coding mode with the first audio coding mode having an effective coded bandwidth greater than an effective coded bandwidth of the second audio coding mode, wherein the decoder is configured to spectrally extend, using blind BWE, the effective coded bandwidth of the second temporal portion into the high-frequency spectral band and temporally shape the information signal's energy in the high-frequency spectral band in the second temporal portion, as spectrally extended using the blind BWE, according to a fade-out scaling function decreasing from the transition towards farther away from the transition.
12. Decoder supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the decoder is configured to, responsive to a switching instance, perform temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the switching switches from a second audio coding mode to a first audio coding mode wherein an effective coded bandwidth of the second audio coding mode is smaller than an effective coded bandwidth of the first audio coding mode, wherein the decoder is configured to temporally shape an information signal's energy in the high-frequency spectral band in the second temporal portion according to a fade-in scaling function increasing from the transition towards farther away from the transition.
13. Decoder supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the decoder is configured to, responsive to a switching instance, perform temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the decoder is configured to perform the temporal smoothing and/or blending at the switching instance by applying a fade-in or fade-out scaling function and to, if a subsequent switching instance occurs during the fade-in or fade-out scaling function, apply, again, a fade-in or fade-out scaling function to a high-frequency spectral band so as to perform temporal smoothing and/or blending at the subsequent switching instance, with setting a starting point of applying the fade-in or fade-out scaling function from the subsequent switching instance on such that the fade-in or fade-out scaling function applied at the subsequent switching instance is, at the starting point, a function value nearest to a function value assumed by the fade-in or fade-out scaling function when being applied at the switching instance, at the time of occurrance of the subsequent switching instance.
14. Method for decoding supporting, and being switchable between, at least two audio coding modes so as to decode an information signal, wherein the method comprises, responsive to a switching instance, performing temporal smoothing and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band, wherein the temporal smoothing and/or blending at the transition is performed by, within the temporary phase which directly follows the transition, crosses the transition or precedes the transition, decreasing an information signal's energy during the temporary portion where the information signal is coded using a first audio coding mode and/or increasing the information signal's energy during the temporary portion (80) where the information signal is coded using a second audio coding mode, wherein the first audio coding mode has an higher energy preserving property compared to the second audio coding mode.
15. An encoder supporting, and being switchable between, at least two modes of different signal-energy-conservation property in a high-frequency spectral band, so as to encode an information signal, wherein the encoder is configured to, responsive to a switching instance, process the information signal by temporally smoothing and/or blending the information signal at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band to obtain a pre-processed version of the information signal, and encode the pre-processed version of the information signal, wherein the encoder is configured to, responsive to a switching instance from a first coding mode comprising a first signal-energy-conservation property in the high-frequency spectral band to a second coding mode comprising a second signal-energy-conservation property in the high-frequency spectral band, temporary encode a modified version of the information signal which is modified compared to the information signal in that an information signal's energy in the high-frequency spectral band in a temporal portion succeeding the switching instance is temporally shaped according to a fade-in scaling function increasing from the transition towards farther away from the transition.
16. A non-transitory computer-readable storage medium storing a computer program comprising a program code for performing, when running on a computer, a method according to claim 15.
17. A method for encoder supporting, and being switchable between, at least two modes of different signal-energy-conservation property in a high-frequency spectral band, so as to encode an information signal, wherein the method comprises, responsive to a switching instance, processing by temporally smoothing the information signal and/or blending at a transition between a first temporal portion of the information signal, preceding the switching instance, and a second temporal portion of the information signal, succeeding the switching instance, in a manner confined to a high-frequency spectral band to obtain a pre-processed version of the information signal, and encoding the pre-processed version of the information signal, wherein the method comprises, responsive to a switching instance from a first coding mode comprising a first signal-energy-conservation property in the high-frequency spectral band to a second coding mode comprising a second signal-energy-conservation property in the high-frequency spectral band, temporarily encoding a modified version of the information signal which is modified compared to the information signal in that an information signal's energy in the high-frequency spectral band in a temporal portion succeeding the switching instance is temporally shaped according to a fade-in scaling function increasing from the transition towards farther away from the transition.
18. A non-transitory computer-readable storage medium storing a computer program comprising a program code for performing, when running on a computer, a method according to claim 17.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present application are described further below with respect to the figures, among which
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
DETAILED DESCRIPTION OF THE INVENTION
(18) Before describing embodiments of the present application further below, reference is briefly made again to
(19) In particular, as shown by use of the grey scale representation of
(20) The two BWE coding modes exemplarily illustrated in
(21) According to blind bandwidth extension, for example, a decoder estimates in accordance with that blind BWE coding mode, the bandwidth extension portion f.sub.stop,Core1 to f.sub.stop,BWE1 from the core coding portion extending from 0 to f.sub.stop,Core1 without any additional side information contained in the data stream in addition to the coding of the core coding's portion of the audio signal spectrum. Owing to the non-guided way in that the audio signal's spectrum coded up to the core coding stop frequency f.sub.stop,Core1, the width of the bandwidth extension portion of blind BWE is usually, but not necessarily smaller than the width of the bandwidth extension portion of the guided BWE mode which extends from f.sub.stop,Core1 to f.sub.stop,BWE2. In guided BWE, the audio signal is coded using the core coding mode as far as the spectral core coding portion extending from 0 to f.sub.stop,Core1 is concerned, but additional parametric side information data is provided so as to enable the decoding side to estimate the audio signal spectrum beyond the crossover frequency f.sub.stop,Core1 within the bandwidth extension portion extending from f.sub.stop,Core1 to f.sub.stop,BWE2. For example, this parametric side information comprises envelope data describing the audio signal's envelope in a spectrotemporal resolution which is coarser than the spectrotemporal resolution in which, when using transform coding, the audio signal is coded in the core coding portion using the core coding. For example, the decoder may replicate the spectrum within the core coding portion so as to preliminarily fill the empty audio signal's portion between f.sub.stop,Core1 and f.sub.stop,BWE2 with then shaping this pre-filled state using the transmitted envelope data.
(22)
(23) However, the spectral portions where annoying artifacts may result from switching between different coding modes is not restricted to those spectral portions where one of the coding modes between which a switching instance takes place is completely bare of coding anything, i.e. is not restricted to spectral portions outside one's of the coding modes effective coding bandwidth. Rather, as is shown in
(24) The above outlined switching scenarios are merely meant to be representative. There are other pairs of coding modes, the switching between which causes, or may cause, annoying artifacts. This is true, for example, for a switching between blind BWE on the one hand and guided BWE on the other hand, or switching between any of blind BWE, guided BWE and FB coding on the one hand and the mere co-coding underlying blind BWE and guided BWE on the other hand or even between different full-band core coders with unequal energy preserving properties.
(25) The embodiments outlined further below overcome the negative effects resulting from the above outlined circumstances when switching between different coding modes.
(26) Before describing these embodiments, however, it is briefly explained with respect to
(27) The encoder shown in
(28) Accordingly, at the switching instances, problems with respect to perceivable artifacts may occur as they were discussed above with respect to
(29) The embodiments described next concern embodiments for a decoder configured to appropriately reduce the negative effects resulting from the switching between coding modes at the encoder side.
(30)
(31) With respect to examples for coding modes supported by decoder 50, reference is made to the above description with respect to
(32) It is noted that the units at which the coding modes may change in time within the data stream may be “frames” of constant or even varying length. Wherever the term “frame” in the following occurs, it is thus meant to denote such a unit at which the coding mode varies in the bit stream, i.e. units between which the coding modes might vary and within which the coding mode does not vary. For example, for each frame, the data stream 34 may comprise a syntax element revealing the coding mode using which the respective frame is coded. Switching instances may thus be arranged at frame borders separating frames of different coding modes. Sometimes the term sub-frames may occur. Sub-frames may represent a temporal partitioning of frames into temporal sub-units at which the audio signal is, in accordance with the coding mode associated with the respective frame, coded using sub-frame specific coding parameters for the respective coding mode.
(33)
(34)
(35) For example, the first coding mode as well as the second coding mode may be core coding modes having different maximum frequencies f.sub.1 and f.sub.max. Alternatively, one or both of these coding modes may involve bandwidth extension with different effective coded bandwidths, one extending up to f.sub.1 and the other to f.sub.max.
(36) The case of 56 illustrates the possibility of both coding modes having an effective coded bandwidth extending up to f.sub.max, with the energy preserving property of the second coding mode, however, being decreased relative to the one of the first coding modes concerning the temporal portion preceding the time instance t.sub.A.
(37) The switching instance A, i.e. the fact that the temporal portion 60 immediately preceding the switching instance A, is coded using the first coding mode, and the temporal portion 62 immediately succeeding the switching instance A is coded using the second coding mode, may be signaled within the data stream 34, or may be otherwise signaled to the decoder 50 such that the switching instances at which decoder 50 changes the coding modes for decoding the audio signal 52 from data stream 34 is synchronized with the switching the respective coding modes at the encoding side. For example, the frame wise mode signaling briefly outlined above may be used by the decoder 50 so as to recognize and identify, or discriminate between different types of, switching instances.
(38) In any case, the decoder of
(39) Similar to 54 and 56, at 68, 70, 72 and 74, a non-exhaustive set of examples show how decoder 50 achieves the temporal smoothing/blending by showing the resulting energy preserving property course, plotted over time t, for an exemplary frequency indicated with dashed lines in 64 within the high-frequency spectral band 66. While examples 68 and 72 represent possible examples of the decoder's 50 functionality for dealing with a switching instance example shown in 54, the examples shown in 70 and 74 show possible functionalities of decoder 50 in case of a switching scenario illustrated at 56.
(40) Again, in the switching scenario illustrated at 54, the second coding mode does not at all reconstruct the audio signal 52 above frequency f.sub.1. In order to perform the temporal smoothing or blending at the transition between the decoded versions of the audio signal 52 before and after the switching instance A, in accordance with the example of 68, the decoder 50 temporarily, for a temporary time period 76 immediately succeeding the switching instance A, performs blind BWE so as to estimate and fill the audio signal's spectrum above frequency f.sub.1 up to f.sub.max. As shown in example 72, the decoder 50 may to this end subject the estimated spectrum within the high-frequency spectral band 66 to a temporal shaping using some fade-out function 78 so that the transition across switching instance A is even more smoothened as far as the energy preserving property within the high-frequency spectral band 66 is concerned.
(41) A specific example for the case of the example 72 is described further below. It is emphasized that the data stream 34 does not need to signal anything concerning the temporary blind BWE performance within data stream 34. Rather, the decoder 50 itself is configured to be responsive to the switching instance A so as to temporarily apply the blind BWE—with or without fade-out.
(42) The extension of the effective coded bandwidth of one of the coding modes adjoining each other across the switching instance beyond its upper bound towards higher frequencies using blind BWE is called temporal blending in the following. As will become clear from the description of
(43) The situation of 56 differs from the situation in 54 in that the energy preserving property of both coding modes adjoining each other across the switching instance A is, in case of 56, unequal to 0 within the high-frequency spectral band 66 in both coding modes. In the case of 56, the energy preserving property suddenly falls at the switching instance A. In order to compensate for potential negative effects of this sudden reduction in energy preserving property in band 66, decoder 50 of
(44) Later on, an example for the alternative shown/illustrated in 70 will be further outlined below. The preliminary change of the audio signal's level, i.e. increase in case of 70 and 74, so as to compensate for the increased/reduced energy preserving property with which the audio signal is encoded before and after the respective switching instance A, is called temporal smoothing in the following. In other words, temporal smoothing within the high-frequency spectral band during the preliminary time period 80, shall denote an increase of the audio signal's 52 level/energy at the temporal portion around the switching instance A where the audio signal is coded using the coding mode having weaker energy preserving property within that high-frequency spectral band relative to the audio signal's 52 level/energy directly resulting from the decoding using the respective coding mode, and/or a decrease of the audio signal's 52 level/energy during the temporary period 80 within a temporal portion around the switching instance A where the audio signal is coded using the coding mode having higher energy preserving property within the high-frequency spectral band, relative to the energy directly resulting from encoding the audio signal with that coding mode. In other words, the way the decoder treats switching instances like 56 is not restricted to placing the temporary period 80 so as to directly following the switching instance A. Rather, the temporary period 80 may cross the switching instance A or may even precede it. In that case, the audio signal's 52 energy is, during the temporary period 80, as far as the temporal portion preceding the switching instance A is concerned, decreased in order to render the resulting energy preserving property more similar to the energy preserving property of the coding mode with which the audio signal is coded subsequent to the switching instance A, i.e. so that the resulting energy preserving property within the high-frequency spectral band lies between the energy preserving property of the coding mode before switching instance A and the energy preserving property of the coding mode subsequent to the switching instant A, both within high-frequency spectral band 66.
(45) Before proceeding with the description of the decoder of
(46) In
(47) The decoder of
(48) Among examples 98 to 104, examples 98 and 100 refer to the switching instance type 92, while the others refer to the switching instance type 94. Like graphs 92 and 94, the graphs shown at 98 to 104 show the temporal course of the energy preserving property for an exemplary frequency line in the inner of the high-frequency spectral band 66. However, 92 and 94 show the original energy preserving property as defined by the respective coding modes preceding and succeeding the switching instance B, while the graphs shown at 98 to 104 show the effective energy preserving property including, i.e. taking into account, the decoder's 50 measures performed responsive to the switching instance as described below.
(49) 98 shows an example where the decoder 50 is configured to perform a temporal blending upon realizing switching instance B: as the energy preserving property of the coding mode valid up to the switching instance B is 0, the decoder 50 preliminarily, for a temporary period 106, decreases the energy/level of the decoded version of the audio signal 52 immediately subsequent to the switching instance B as resulting from decoding using the respective coding mode valid from switching instance B on, so that within that temporary period 106 the effective energy preserving property lies somewhere between the energy preserving property of the coding mode preceding the switching instance B, and the unmodified/original energy preserving property of the coding mode succeeding the switching instance B, as far as the high-frequency spectral band 66 is concerned. The example 68 uses an alternative according to which a fade-in function is used to gradually/continuously increase the factor by which the audio signal's 52 energy is scaled during the temporary time period 106 from the switching instance B to the end of period 106. As explained above, however, with respect to
(50) 100 shows an example for an alternative of decoder's 50 functionality upon realizing switching instance B, which was already discussed with respect to
(51) In case of switching between coding modes like in 94, the energy preserving property within band 66 is unequal to 0 both preceding as well as succeeding the switching instance B. The difference to the case shown at 56 in
(52) For completeness, 104 shows an alternative according to which decoder 50 faces/shifts the temporary period 108 in a temporal upstream direction so as to immediately precede the switching instance B with accordingly increasing the audio signal's 52 energy during that period 108 using a scaling factor so as to set the resulting energy preserving property to lie somewhere between the original/unmodified energy preserving properties of the coding mode between which the switching instance B takes place. Even here, some fade-in scaling function may be used instead of a constant scaling factor.
(53) Thus, examples 102 and 104 show two examples for performing temporal smoothing responsive to a switching instance B and just as it has been discussed with respect to
(54) After having described
(55)
(56)
(57) The decoder, in accordance with a mode of
(58)
(59)
(60) The core coding modes illustrated with respect to
(61) An blind BWE mode would merely comprise the core coding data, and would estimate the audio signal's spectrum above the core coding bandwidth using extrapolation of the audio signal's envelope into the higher frequency region above f.sub.core, for example, and using artificial noise generation and/or spectral replication from core coding portion to the higher frequency region (bandwidth extension portion) in order to determine the fine structure in that region.
(62) Back to f.sub.1 and f.sub.max of
(63) For the sake of completeness,
(64)
(65) A specific variant of
(66) That is,
(67) With respect to
(68) Scale factor determiner 170 could treat transitions by coding mode switchings differently depending on the direction of switching, i.e. from a coding mode with higher energy preserving property to a coding mode with lower energy preserving property as far as the high-frequency spectral band is concerned and vice versa, and/or dependent on an analysis of a temporal course of energy of the audio signal in an analysis spectral band as will be outlined in more detail below. By this measure, the scale factor determiner 170 could set the degree of “low pass filtering” of the audio signal's energy within the high-frequency spectral band temporally, so as to avoid unpleasant “smearings”. For example, the scale factor determiner 170 could reduce the degree of low pass filtering in areas where an evaluation of the audio signal's energy course within the analysis spectral band suggests that the switching instance takes place at a temporal instance where a tonal phase of the audio signal's content abuts an attack or vice versa so that the low pass filtering would rather degrade the audio signal's quality resulting at the decoder's output rather than improving the same. Likewise, the kind of “cut-off” of energy components at the end of an attack in the audio signal's content, in the high-frequency spectral band, tends to degrade the audio signal's quality more than cut-offs in the high-frequency spectral band at the beginning of such attacks, and accordingly scale factor determiner 170 may advantageously reduce the low-pass filtering degree at transitions from a coding mode having lower energy preserving property in the high-frequency spectral band to a coding mode having higher energy preserving property in that spectral band.
(69) It is worthwhile to note that in case of
(70) The embodiment described below with respect to
(71)
(72) As is visible in
(73) In the embodiment outlined further below with respect to
(74)
(75) In the following, specific embodiments are described in a more detailed manner. As described above, the embodiments outlined further below in more detail seek to obtain seamless transitions between different BWEs and a full-band core, using two processing steps which are performed within the decoder.
(76) The processing is, as outlined above, applied at the decoder-side in the frequency domain, such as FFT, MDCT or QMF domain, in the form of a post-processing stage. Thereinafter, it is described that some steps could be further performed already within the encoder, such as the application of fade-in blending into the wider effective bandwidth such as full-band core.
(77) In particular, with respect to
(78) The purpose of the signal-adaptive smoothing is to obtain seamless transitions by preventing from unintended energy jumps. On the contrary, energy variations that are present in the original signal need to be preserved. The latter circumstance has also been discussed above with respect to
(79) Hence, in accordance with a signal-adaptive smoothing function at the decoder side described now, the following steps are performed wherein reference is made to
(80) As shown in the flow diagram of
δ.sub.intra=E.sub.analysis,2−E.sub.analysis,1
δ.sub.inter=E.sub.analysis,1−E.sub.analysis,prev
δ.sub.max=max(|δ.sub.intra|,|δ.sub.inter|)
(81) That is, the calculation could for example calculate the energy difference between energies of the audio signal as coded into the data stream in the analysis spectral band, once sampled from temporal portions, i.e. subframe 1 and subframe 2 in
(82) Thereinafter, at 214, the calculated energy parameters resulting from the evaluation in step 202 are used to determine the smoothing factor α.sub.smooth. In accordance with one embodiment, α.sub.smooth is set dependent on the maximum energy difference δ.sub.max, namely so that α.sub.smooth is bigger the smaller δ.sub.max is. α.sub.smooth is within the interval [0 . . . 1], for example. While the evaluation in 202 is performed, for example, by evaluator 194 of
(83) The determination in step 214 of the smoothing factor α.sub.smooth may, however, also take into account the sign of the maximally valued one of the difference values δ.sub.intra and δ.sub.inter, i.e. sign of δ.sub.intra if the absolute of δ.sub.intra is higher than the absolute value of δ.sub.inter, and the sign of hinter if the absolute value of δ.sub.inter is greater than the absolute value of δ.sub.intra.
(84) In particular, for energy drops that are present in the original audio signal, less smoothing needs to be applied to prevent energy smearing to originally low-energy regions, and accordingly α.sub.smooth could be determined in step 214 to be lower in value in case the sign of the maximum energy difference indicates an energy drop in the audio signal's spectrum within the analysis spectral band 190.
(85) In step 216, the smoothing factor α.sub.smooth determined in step 214, is then applied to the previous energy value determined from the spectrotemporal tile preceding the switching instance, in the high-frequency spectral band 66, i.e. E.sub.actual,prev, and the current, actual energy determined from a spectrotemporal tile in the high-frequency spectral band 66 following the switching instance 204, i.e. E.sub.actual,curr, to get the target energy E.sub.target,curr of the current frame or temporal portion forming the temporary period at which the temporal smoothing is to be performed. According to the application 216, the target energy is calculated as
E.sub.target,curr=α.sub.smooth.Math.E.sub.actual,prev+(1−α.sub.smooth).Math.E.sub.actual,curr.
(86) The application in 216 would be performed by scale factor determiner 170 as well.
(87) The calculation of the scaling factor to be applied to the spectrotemporal tile 220 extending over the temporary period 222 along the temporal axis t, and extending over the high-frequency spectral band 66 along the spectral axis f, in order to scale the spectral samples x within that defined target frequency range f.sub.target,start to f.sub.target,stop towards the current target energy may then involve
∝.sub.scale=√{square root over (E.sub.target,curr/E.sub.actual,curr)}
x.sub.new=α.sub.scale.Math.x.sub.old.
(88) While the calculation of α.sub.scale would, for example, be performed by the scale factor determined 170, the multiplication using α.sub.scale as a factor, would be performed by the aforementioned scaler 156 within the spectrotemporal tile 220.
(89) For the sake of completeness, it is noted that the energies E.sub.actual,prev and E.sub.actual,curr may be determined in the same manner as described above with respect to the spectrotemporal tiles 206 to 210: a summation over the squares of the spectral values within the spectrotemporal tile 224 temporally preceding the switching instance 204 and extending over the high-frequency spectral band 66 may be used to determined E.sub.actual,prev and a summation over squares of the spectral values within the spectrotemporal tiles 220 may be used to determined E.sub.actual,curr.
(90) It is noted that in the example of
(91) Next, a concrete, more detailed embodiment for performing the temporal blending is described. This bandwidth blending has, as described above, the purpose to suppress annoying bandwidth fluctuations on the one hand, and enable that each coding mode neighboring a respective switching instance may be run at its intended effective coded bandwidth. For example, smooth adaptation may be applied to enable that each BWE may be run at its intended optimal bandwidth.
(92) The following steps are performed by the decoder: as shown in
(93) Then, in step 234 an enhancement of the coding mode after the switching instance 204 is performed so as to result in an auxiliary extension 234 of the bandwidth of the coding mode after the switching instance 204 into the blending region or high-frequency spectral band 66 so as to fill this blending region 66 gaplessly during t.sub.blend,max, i.e. so as to fill the spectrotemporal tile 236 in
(94) Then, in 238 a blending factor w.sub.blend is calculated, where t.sub.blend,act denotes the actual elapsed time since the switching, here exemplarily at to:
w.sub.blend=(t.sub.blend,max−t.sub.blend,act)/t.sub.blend,max
(95) The temporal course of the blending factor thus determined is illustrated in
(96) Thereinafter, in 240, the weighting of the spectral samples x within the spectrotemporal tile 236, i.e. within the blending region 66 during the temporary period defined, or limited to, the maximum blending time is performed using the blending factor w.sub.blend according to
x.sub.new=w.sub.blend.Math.x.sub.old
(97) That is, in the scaling step 240, the spectral values within spectrotemporal tile 236 are scaled according to w.sub.blend, to be more precise namely the spectral values temporally succeeding the switching instance 204 by t.sub.blend,act are scaled according to w.sub.blend(t.sub.blend,act).
(98) In case of a switching type 92, the setting of maximum blending time and blending region is performed at 242 in a manner similar to 232. The maximum blending time t.sub.blend,max for switching types 92 may be different to t.sub.blend,max set in 232 in the case of a switching type 54. Reference is made also to the subsequent description of switching during blending.
(99) Then, the blending factor is calculated, namely w.sub.blend. The calculation 244 may calculate the blending factor dependent on the elapsed time since the switching at t.sub.0, i.e. depending on t.sub.blend,act according to paragraph
w.sub.blend=t.sub.blend,act/t.sub.blend,max
(100) Then the actual scaling in 246 takes place using the blending factor in a manner similar to 240.
(101) Switching During Blending
(102) Nevertheless, the above-mentioned approach only works, if during the blending process no further switching takes place, as shown in
t.sub.blend,act=t.sub.blend,max−t.sub.blend,act
resulting in a reverted blending process completed at t.sub.2 as shown in
(103) Thus, this modified update would be performed in steps 232 and 242 in order to account for the interrupted fade-in or fade-out process, interrupted by the new, currently occurring switching instance, here exemplarily at t.sub.1. In other words, the decoder would perform the temporal smoothing or blending at a first switching instance t.sub.0 by applying a fade-out (or fade-in) scaling function 240 and, if a second switching instance t.sub.1 occurs during the fade-out (or fade-in) scaling function 240, apply, again, a fade-in (or fade-out) scaling function 242 to a high-frequency spectral band 66 so as to perform temporal smoothing or blending at the second switching instance t.sub.1, with setting a starting point of applying the fade-in (or fade-out) scaling function 242 from the second switching instance t.sub.2 on such that the fade-in (or fade-out) scaling function 242 applied at the second switching instance t.sub.2 has, at the starting point, a function value nearest to—or equal to a function value assumed by the fade-out (or fade-in) scaling function 240 as applied at the first switching instance, at the time t.sub.2 of occurrence of the second switching instance.
(104) The embodiments described above relate to audio and speech coding and particularly to coding techniques using different bandwidth extension methods (BWE) or non-energy preserving BWE(s) and a full-band core-coder without a BWE in a switched application. It has been proposed to enhance the perceptual quality by smoothing the transitions between different effective output bandwidths. In particular, a signal-adaptive smoothing technique is used to obtain seamless transitions, and a possibly, but not necessarily uniform blending technique between different bandwidths to achieve the optimal output bandwidth for each BWE while disturbing bandwidth fluctuations are avoided.
(105) Unintended energy jumps when switching between different BWEs or full-band core are avoided by way of the above embodiments whereas in—and decreases that are present in the original signal (e.g. due to on—or offsets of sibilants) may be preserved. Furthermore, smooth adaptions of the different bandwidths are exemplarily performed to enable each BWE to be run at its intended, optimal bandwidth if it needs to be active for a longer period.
(106) Except for the decoder's functionalities at switching instances necessitating blind BWE, same functionalities may also be taken over by the encoder. The encoder such as 30 of
(107) For example, if the encoder 30 of
(108) Upon encountering a switching instance of type 56, the encoder 30 could act as follows. The encoder 30 could, preliminarily for a temporary time period directly starting at the switching instance, amplify, i.e. scale-up, the audio signal within the high-frequency spectral band 66, with or without a fade-out scaling function, and could then encode the thus modified audio signal. Alternatively, the encoder 30 could first of all encode the original audio signal using the coding mode valid directly after the switching instance up to some syntax element level, with then amending the latter so as to amplify the audio signal within the high-frequency spectral band during the temporary time period. For example, if the coding mode to which the switching instance takes place involves a guided bandwidth extension into the high-frequency spectral band 66, the encoder 30 could appropriately scale-up the information on the spectral envelope concerning this high-frequency spectral band during the temporary time period.
(109) However, if the encoder 30 encounters a switching instance of type 92, the encoder 30 could either encode the temporal portion of the audio signal following the switching instance unmodified up to some syntax element level and then amending, for example, same in order to subject the high-frequency spectral band of the audio signal during that temporary time period to a fade-in function, such as by appropriately scaling scale factors and/or spectral line values within the respective spectrotemporal tile, or the encoder 30 first modifies the audio signal within the high-frequency spectral band 66 during the temporary time period immediately starting at the switching instance, with then encoding the thus modified audio signal.
(110) When encountering a switching instance of type 94, the encoder 30 could for example act as follows: the encoder could, for a temporary time period immediately starting at the switching instance, scale-down the audio signal's spectrum within the high-frequency spectral band 66—by applying a fade-in function or not. Alternatively, the encoder could encode the audio signal at the time portion following the switching instance using the coding mode to which the switching instance takes place, without any modification up to some syntax element level, with then changing appropriate syntax elements so as to provoke the respective scaling-down of the audio signal's spectrum within the high-frequency spectral band during the temporary time period. The encoder may appropriately scale-down respective scale factors and/or spectral line values.
(111) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
(112) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(113) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(114) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(115) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(116) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(117) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
(118) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(119) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(120) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(121) A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
(122) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
(123) The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
(124) The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
(125) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
(126) [1] Recommendation ITU-T G.718—Amendment 2: “Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s—Amendment 2: New Annex B on superwideband scalable extension for ITU-T G.718 and corrections to main body fixed-point C-code and description text” [2] Recommendation ITU-T G.729.1—Amendment 6: “G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729—Amendment 6: New Annex E on superwideband scalable extension” [3] B. Geiser, P. Jax, P. Vary, H. Taddei, S. Schandl, M. Gartner, C. Guillaumé, S. Ragot: “Bandwidth Extension for Hierarchical Speech and Audio Coding in ITU-T Rec. G.729.1”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, No. 8, 2007, pp. 2496-2509 [4] M. Tammi, L. Laaksonen, A. Rämö, H. Toukomaa: “Scalable Superwideband Extension for Wideband Coding”, IEEE ICASSP 2009, pp. 161-164 [5] B. Geiser, P. Jax, P. Vary, H. Taddei, M. Gartner, S. Schandl: “A Qualified ITU-T G.729 EV Codec Candidate for Hierarchical Speech and Audio Coding”, 2006 IEEE 8th Workshop on Multimedia Signal Processing, pp. 114-118