Noise filling concept
09792920 · 2017-10-17
Assignee
Inventors
- Sascha Disch (Fuerth, DE)
- Marc Gayer (Erlangen, DE)
- Christian Helmrich (Erlangen, DE)
- Goran Markovic (Nuremberg, DE)
- Maria Luis Valero (Nuremberg, DE)
Cpc classification
International classification
Abstract
Noise filling of a spectrum of an audio signal is improved in quality with respect to the noise filled spectrum so that the reproduction of the noise filled audio signal is less annoying, by performing the noise filling in a manner dependent on a tonality of the audio signal.
Claims
1. Apparatus configured to perform noise filling on a spectrum of an audio signal in a manner dependent on a tonality of the audio signal, wherein the apparatus is configured to: dequantize the spectrum, as derived after the noise-filling, using a spectrally varying and signal-adaptive quantization step size controlled via a linear prediction spectral envelope signaled via linear prediction coefficients in a data stream into which the spectrum is coded, or scale factors relating to scale factor bands, signaled in the data stream into which the spectrum is coded, identify contiguous spectral zero-portions of the audio signal's spectrum and to apply the noise filling onto the contiguous spectral zero-portions identified, and respectively fill the contiguous spectral zero-portions of the audio signal's spectrum with noise spectrally shaped with a function having a local maximum surrounded by two outwardly falling flanks wherein the function is set dependent on a respective contiguous spectral zero-portion's width so that the function is confined to the respective contiguous spectral zero-portion, and wherein a fill width at half maximum of the function is adjusted dependent on the tonality of the audio signal so that, if the tonality of the audio signal increases, the fill width at half maximum of the function gets more compact in an inner of the respective contiguous spectral zero-portion and distanced from the respective contiguous spectral zero-portion's outer edges.
2. Apparatus according to claim 1, wherein the apparatus is configured to scale the noise with which the contiguous spectral zero-portions are filled using a scalar global noise level signaled in the data stream into which the spectrum is coded in a spectrally global manner.
3. Apparatus according to claim 1, wherein the apparatus is configured to generate the noise with which the contiguous spectral zero-portions are filled, using a random or pseudo-random process or using patching.
4. Apparatus according to claim 1, wherein the apparatus is configured to derive the tonality from a coding parameter coded within the data stream so that the dependency on the tonality involves a dependency on the coding parameter.
5. Apparatus according to claim 4, wherein the apparatus is configured such that the coding parameter is one of an LTP (long-term prediction) flag or gain, and a TNS (temporal noise shaping) enablement flag or gain, and a spectrum rearrangement enablement flag signalling a coding option according to which quantized spectral values are spectrally re-arranged with additionally transmitting within the data stream the rearrangement prescription.
6. Apparatus according to claim 1, wherein the apparatus is configured to confine the performance of the noise filling onto a high-frequency spectral portion of the audio signal's spectrum.
7. Apparatus according to claim 1, wherein the apparatus is configured to set a low-frequency starting position of the high-frequency spectral portion corresponding to an explicit signaling in the data stream.
8. Apparatus according to claim 1, wherein the apparatus is configured to, in performing the noise filling, fill contiguous spectral zero-portions of the spectrum with noise a level of which exhibits a decrease from low to high frequencies, approximating a spectral low-pass filter's transfer function so as to counteract a spectral tilt caused by a pre-emphasis used to code the audio signal's spectrum.
9. Apparatus according to claim 8, wherein the apparatus is configured to adapt a steepness of the decrease to a pre-emphasis factor of the pre-emphasis.
10. Audio decoder supporting noise filling comprising an apparatus according to claim 1.
11. Perceptual transform audio decoder comprising an apparatus configured to perform noise filling on a spectrum of an audio signal according to claim 1; and a frequency domain noise shaper configured to subject the noise filled spectrum to spectral shaping using a spectral perceptual weighting function.
12. Audio encoder supporting noise filling comprising an apparatus according to claim 1, the encoder being configured to use a spectrum filled with noise by the apparatus, for analysis-by-synthesis.
13. Apparatus comprising a microprocessor configured to, an electronic circuit configured to, or a programmable computer programmed to: perform noise filling on a spectrum of an audio signal in a manner dependent on a tonality of the audio signal by filling a contiguous spectral zero-portion of the audio signal's spectrum with noise spectrally shaped by a function having a local maximum surrounded by two outwardly falling flanks wherein the function is set dependent on a respective contiguous spectral zero-portion's width so that the function is confined to the respective contiguous spectral zero-portion, and wherein a fill width at half maximum of the function is adjusted dependent on the tonality of the audio signal so that, if the tonality of the audio signal increases, the fill width at half maximum of the function gets more compact in an inner of the respective contiguous spectral zero-portion and distanced from the respective contiguous spectral zero-portion's outer edges, and dequantize the spectrum, as derived by the noise-filling, using a spectrally varying and signal-adaptive quantization step size controlled via a linear prediction spectral envelope signaled via linear prediction coefficients in a data stream into which the spectrum is coded, or scale factors relating to scale factor bands, signaled in the data stream into which the spectrum is coded.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
DETAILED DESCRIPTION OF THE INVENTION
(22) Wherever in the following description of the figures, equal reference signs are used for the elements shown in these figures, the description brought forward with regard to one element in one figure shall be interpreted as transferrable onto the element in another figure having been referenced using the same reference sign. By this measure, an extensive and repetitive description is avoided as far as possible, thereby concentrating the description of the various embodiments onto the differences among each other rather than describing all embodiments anew from the outset on, again and again.
(23) The following description starts with embodiments for an apparatus for performing noise filling on a spectrum of an audio signal, first. Second, different embodiments are presented for various audio codecs, where such a noise filling may be built-in, along with specifics which could apply in connection with a respective audio codec presented. It is noted that the noise filling described next may, in any case, be performed at the decoding side. Depending on the encoder, however, the noise filling as described next may also be performed at the encoding side such as, for example, for analysis-by-synthesis reasons. An intermediate case according to which the modified way of noise filling in accordance with the embodiments outlined below merely partially changes the way the encoder works such as, for example, in order to determine a spectrally global noise filling level, is also described below.
(24)
(25) Beyond that, in a time-aligned manner
(26)
(27) The apparatus of
(28) The actual noise filling is performed by noise filler 32. The noise filler 32 receives the spectrum to which the noise filling shall be applied. This spectrum is illustrated in
(29) Accordingly, it is the task of tonality determiner 34 to provide the noise filler 32 with an estimation of the tonality on the basis of another tonality hint 38 as will be described in more detail below. In accordance with the embodiments described later, the tonality hint 38 may be available at encoding and decoding sides anyway, by way of a respective coding parameter conveyed within the data stream of the audio codec within which apparatus 30 is, for example, used.
(30)
(31) The tonality dependency of the noise filling generally described above with respect to
(32) As can be seen, the absolute value of the slope of edges 58 and 60 is higher for function 50 than for function 48. The noise filler 32 selects to fill the zero-portion 40 with function 50 for tonalities lower than tonalities for which noise filler 32 selects to use function 48 for filling zero-portion 40. By this measure, the noise filler 32 avoids clustering the immediate periphery of potentially tonal spectral peaks of spectrum 34, such as, for example, peak 62. The smaller the absolute slope of edges 58 and 60 is, the further away the noise filled into zero-portion 40 is from the non-zero portions of spectrum 34 surrounding zero-portion 40.
(33) Noise filler 32 may, for example, choose to select function 48 in case of the audio signal's tonality being τ.sub.2, and function 50 in case of the audio signal's tonality being τ.sub.1, but the description brought forward further below will reveal that noise filler 32 may discriminate more than two different states of the audio signal's tonality, i.e. may support more than two different functions 48, 50 for filling a certain contiguous spectral zero-portion and choose between those depending on the tonality via a surjective mapping from tonalities to functions.
(34) As a minor note, it is noted that the construction of functions 48 and 50 according to which same have a plateau in the inner interval 52, flanked by edges 58 and 60 so as to result in unimodal functions, is merely an example. Alternatively, bell-shaped functions may be used, for example, in accordance with an alternative. The interval 52 may alternatively be defined as the interval between which the function is higher than 95% of its maximum value.
(35)
(36)
(37) In order to explain this, see
(38) In
(39) In this situation, the integral of function 50 over quarters a, d is greater than the integral of function 48 over quarters a, d and accordingly, noise filler 32 uses function 50 for higher tonalities and function 48 for lower tonalities, i.e. the integral over the outer quarters of the normalized functions 50 and 48 negatively depends on the tonality.
(40) For illustration purposes, in case of
(41) Although the type of variation of functions 48 and 50 depending on the tonality varies, all examples of
(42) Until now, the description of
(43) The zero-portion filler 72 is configured to fill the identified contiguous spectral zero-portions identified by identifier 70 with noise spectrally shaped in accordance with a function as described above with respect to
(44) In particular, the individual filling of each contiguous spectral zero-portion identified by identifier 70 may be performed by filler 72 as follows: the function is set dependent on the contiguous spectral zero-portion's width so that the function is confined to the respective contiguous spectral zero-portion, i.e. the domain of the function coincides with the contiguous spectral zero-portion's width. The setting of the function is further dependent on the tonality of the audio signal, namely in the manner outlined above with respect to
(45) It has already been outlined above that the noise filling's dependency on the tonality may discriminate between more than only two different tonalities such as 3, 4 or even more then 4.
(46) Until now, the description of certain embodiments of the present application focused on the function's shape used to spectrally shape the noise with which certain contiguous spectral zero-portions are filled. It is advantageous, however, to control the overall level of noise added to a certain spectrum to be noise filled so as to result in a pleasant reconstruction, or to even control the level of noise introduction spectrally.
(47)
(48) In accordance with one embodiment, the available set of functions 48, 50 for spectrally shaping the noise to be filled into the portions 90-94, all have a predefined scale which is known to encoder and decoder. A spectrally global scaling factor is signaled explicitly within the data stream into which the audio signal, i.e. the non-quantized part of the spectrum, is coded. This factor indicates, for example, the RMS or another measure for a level of noise, i.e. random or pseudorandom spectral line values, with which portions 90-94 are pre-set at the decoding side with then being spectrally shaped using the tonality dependently selected functions 48, 50 as they are. As to how the global noise scaling factor could be determined at the encoder side is described further below. Let, for example, A be the set of indices i of spectral lines where the spectrum is quantized to zero and which belong to any of the portions 90-94, and let N denote the global noise scaling factor. The values of the spectrum shall be denoted x.sub.i. Further, “random(N)” shall denote a function giving a random value of a level corresponding to level “N” and left(i) shall be a function indicating for any zero-quantized spectral value at index i the index of the zero-quantized value at the low-frequency end of the zero-portion to which i belongs, and F.sub.i (j) with j=0 to J.sub.i−1 shall denote the function 48 or 50 assigned to, depending on the tonality, the zero-portion 90-94 starting at index i, with J.sub.i, indicating the width of that zero-portion. Then, portions 90-94 are filled according to x.sub.i=F.sub.left(i)(i−left(i)).Math.random(N).
(49) Additionally, the filling of noise into portions 90-94, may be controlled such that the noise level decreases from low to high frequencies. This may be done by spectrally shaping the noise with which portions are pre-set, or spectrally shaping the arrangement of functions 48,50 in accordance with a low-pass filter's transfer function. This may compensate for a spectral tilt caused when re-scaling/dequantizing the filled spectrum due to, for example, a pre-emphasis used in determining the spectral course of the quantization step size. Accordingly, the steepness of the decrease or the low-pass filter's transfer function may be controlled according to a degree of pre-emphasis applied. Applying the nomenclature used above, portions 90-94 may be filled according to x.sub.i=F.sub.left(i)(i−left(i)).Math.random(N).Math.LPF(i) with LPF(i) denoting the low-frequency filter's transfer function which may be linear. Depending on the circumstances, the function LPF which corresponds to function 15 may have a positive slope and LPF changed to read HPF accordingly.
(50) Instead of using a fixed scaling of the functions selected depending on tonality and zero-portion's width, the just outlined spectral tilt correction may directly be accounted for by using the spectral position of the respective contiguous zero-portion also as an index in looking-up or otherwise determining 80 the function to be used for spectrally shaping the noise with which the respective contiguous spectral zero-portion has to be filled. For example, a mean value of the function or its pre-scaling used for spectrally shaping the noise to be filled into a certain zero-portion 90-94 may depend on the zero-portion's 90-94 spectral position so that, over the whole bandwidth of the spectrum, the functions used for the contiguous spectral zero-portions 90-94 are pre-scaled so as to emulate a low-pass filter transfer function so as to compensate for any high pass pre-emphasis transfer function used to derive the non-zero quantized portions of the spectrum.
(51) Having described embodiments for performing the noise filling, in the following embodiments for audio codecs are presented where the noise filling outlined above may be advantageously built into.
(52) The spectral line-wise representation of the audio signal, i.e. the spectrogram 12, and the masking threshold enter quantizer 108 which is responsible for quantizing the spectral samples of the spectrogram 12 using a spectrally varying quantization step size which depends on the masking threshold: the larger the masking threshold, the smaller the quantization step size is. In particular, the quantizer 108 informs the decoding side of the variation of the quantization step size in the form of so-called scale factors which, by way of the just-described relationship between quantization step size on the one hand and perceptual masking threshold on the other hand, represent a kind of representation of the perceptual masking threshold itself. In order to find a good compromise between the amount of side information to be spent for transmitting the scale factors to the decoding side, and the granularity of adapting the quantization noise to the perceptual masking threshold, quantizer 108 sets/varies the scale factors in a spectrotemporal resolution which is lower than, or coarser than, the spectrotemporal resolution at which the quantized spectral levels describe the spectral line-wise representation of the audio signal's spectrogram 12. For example, the quantizer 108 subdivides each spectrum into scale factor bands 110 such as bark bands, and transmits one scale factor per scale factor band 110. As far as the temporal resolution is concerned, same may also be lower as far as the transmission of the scale factors is concerned, compared to the spectral levels of the spectral values of spectrogram 12.
(53) Both the spectral levels of the spectral values of the spectrogram 12, as well as the scale factors 112 are transmitted to the decoding side. However, in order to improve the audio quality, the encoder 100 transmits within the data stream also a global noise level which signals to the decoding side the noise level up to which zero-quantized portions of representation 12 have to be filled with noise before rescaling, or dequantizing, the spectrum by applying the scale factors 112. This is shown in
(54) As already denoted above, the noise filling to which the global noise level 114 refers, may be subject to a restriction in that this kind of noise filling merely refers to frequencies above some starting frequency which is indicated in
(55)
(56) The encoder 100 of
(57) As far as the dependency on the tonality is concerned, the encoder 100 may determine the global noise level 114, and insert same into the data stream, by associating to the zero-portions 40a to 40d the function for spectrally shaping the noise for filling the respective zero-portion. In particular, the encoder may use these functions in order to weight the original, i.e. weighted but not yet quantized, audio signal's spectral values in these portions 40a to 40d in order to determine the global noise level 114. Thereby, the global noise level 114 determined and transmitted within the data stream, leads to a noise filling at the decoding side which more closely recovers the original audio signal's spectrum.
(58) The encoder 100 may, depending on the audio signal's content, decide on using some coding options which, in turn, may be used as tonality hints such as the tonality hint 38 shown in
(59) Additionally or alternatively, encoder 100 may support temporal noise shaping. That is, on a per spectrum 18 basis, for example, encoder 100 may choose to subject spectrum 18 to temporal noise shaping with indicating this decision by way of a temporal noise shaping enablement flag to the decoder. The TNS enablement flag indicates whether the spectral levels of spectrum 18 form the prediction residual of a spectral, i.e. along frequency direction determined, linear prediction of the spectrum or whether the spectrum is not LP predicted. If TNS is signaled to be enabled, the data stream additionally comprises the linear prediction coefficients for spectrally linear predicting the spectrum so that the decoder may recover the spectrum using these linear prediction coefficients by applying same onto the spectrum before or after the rescaling or dequantizing. The TNS enablement flag is also a tonality hint: if the TNS enablement flag signals TNS to be switched on, e.g. on a transient, then the audio signal is very unlikely to be tonal, as the spectrum seems to be well predictable by linear prediction along frequency axis and, hence, non-stationary. Accordingly, the tonality may be determined on the basis of the TNS enablement flag such that the tonality is higher if the TNS enablement flag disables TNS, and is lower if the TNS enablement flag signals the enablement of TNS. Instead of, or in addition to, a TNS enablement flag, it may be possible to derive from the TNS filter coefficients a TNS gain indicating a degree up to which TNS is usable for predicting the spectrum, thereby also revealing a more-than-two-valued hint concerning the tonality.
(60) Other coding parameters may also be coded within the data stream by encoder 100. For example, a spectral rearrangement enablement flag may signal one coding option according to which the spectrum 18 is coded by rearranging the spectral levels, i.e. the quantized spectral values, spectrally with additionally transmitting within the data stream the rearrangement prescription so that the decoder may rearrange, or rescramble, the spectral levels so as to recover spectrum 18. If the spectrum rearrangement enablement flag is enabled, i.e. spectrum rearrangement is applied, this indicates that the audio signal is likely to be tonal as rearrangement tends to be more rate/distortion effective in compressing the data stream if there are many tonal peaks within the spectrum. Accordingly, additionally or alternatively, the spectrum rearrangement enablement flag may be used as a tonal hint and the tonality used for noise filling may be set to be larger in case of the spectrum rearrangement enablement flag being enabled, and lower if the spectrum arrangement enablement flag is disabled.
(61) For the sake of completeness, and also with reference to
(62) As far as the concept of imposing a spectrally global tilt on the noise and taking the same into account when computing the noise level parameter at encoding side is concerned, the encoder 100 may determine the global noise level 114, and insert same into the data stream, by weighting portions of the not-yet quantized, but with the inverse of the perceptual weighting function weighted audio signal's spectral values, spectrally co-located to zero-portions 40a to 40d, with a function spectrally extending at least over the whole noise filling portion of the spectrum bandwidth and having a slope of opposite sign relative to the function 15 used at the decoding side for noise filling, for example and measuring the level based on the thus weighted non-quantized values.
(63)
(64) As already described with respect to
(65) It is noted that the noise which noise filler 30 spectrally shapes in the tonality dependent manner described above and/or subjects to a spectrally global tilt in a manner described above, may stem from a pseudorandom noise source, or may be derived from noise filler 30 on the basis of spectral copying or patching from other areas of the same spectrum or related spectrums, such as a time-aligned spectrum of another channel, or a temporally preceding spectrum. Even patching from the same spectrum may be feasible, such as copying from lower frequency areas of spectrum 18 (spectral copy-up). Irrespective of the way the noise filler 30 derives the noise, filler 30 spectrally shapes the noise for filling into contiguous spectral zero-portions 40a to 40d in the tonality dependent manner described above and/or subjects same to a spectrally global tilt in a manner described above.
(66) For the sake of completeness only, it is shown in
(67) Even here, the noise filler 30 may apply the tonality dependent filling of the contiguous spectral zero-portions 40a to 40d exemplarily as shown in
(68) In accordance with the audio codec examples outlined above with respect to
(69)
(70)
(71) By way of dotted lines in
(72) Up to now, several embodiments have been described, and hereinafter specific implementation examples are presented. The details brought forward with respect to these examples, shall be understood as being individually transferrable onto the above embodiments to further specify same. Before that, however, it should be noted that all of the embodiments described above may be used in audio as well as speech coding. They generally refer to transform coding and use a signal adaptive concept for replacing the zeros introduced in the quantization process with spectrally shaped noise using very small amount of side information. In the embodiments described above, the observation has been exploited that spectral holes sometimes also appear just below a noise filling starting frequency if any such starting frequency is used, and that such spectral holes are sometimes perceptually annoying. The above embodiments using an explicit signaling of the starting frequency allow for removing the holes that bring degradation but allow for avoiding to insert noise at low frequencies wherever the insertion of noise would introduce distortions.
(73) Moreover, some of the embodiments outlined above use a pre-emphasis controlled noise filing in order to compensate for the spectral tilt caused by the pre-emphasis. These embodiments take into account the observance that if the LPC filter is calculated on a pre-emphasis signal, merely applying a global or average magnitude or average energy of the noise to be inserted would cause the noise shaping to introduce a spectral tilt in the inserted noise as the FDNS at the decoding side would subject the spectrally flat inserted noise to a spectral shaping still showing the spectral tilt of the pre-emphasis. Accordingly, the latter embodiments performed a noise filling in such a manner that the spectral tilt from the pre-emphasis is taken into account and compensated.
(74) Thus, in other words,
(75) Further, the perceptual transform audio decoder comprises a frequency domain noise shaper 6 in form of dequantizer 132, 174, configured to subject the noise-filled spectrum to spectral shaping using a spectral perceptual weighting function. In case of
(76) Further, the perceptual transform audio decoder comprises an inverse transformer 134, 176 configured to inversely transform the noise-filled spectrum, spectrally shaped by the frequency domain noise shaper, to obtain an inverse transform, and subject the inverse transform to an overlap-add process.
(77) Correspondingly,
(78) The just-applied alternative and generalizing wording used to describe
(79)
(80) As shown in
(81) In order to control noise filling to be performed at the decoding side so as to improve the spectrum 34, with regard to setting the level of the noise, a noise level computer 3 of the perceptual transform audio encoder may optionally be present which computes a noise level parameter by measuring a level of the perceptually weighted spectrum 4 at portions 5 co-located to zero-portions 40 of the quantized spectrum 34. The noise level parameter thus computed may also coded in the aforementioned data stream so as to arrive at the decoder.
(82) The perceptual transform audio decoder is shown in
(83) The significance of filling spectrum 34 with noise 9 which exhibits a spectrally global tilt is the following: later, when the noise filled spectrum 36 is subject to the spectral shaping by frequency domain noise shaper 6, spectrum 36 will be subject to a tilted weighting function. For example, the spectrum will be amplified at the high frequencies when compared to a weighting of the low frequencies. That is, the level of spectrum 36 will be raised at higher frequencies relative to lower frequencies. This causes a spectrally global tilt with positive slope in originally spectrally flat portions of spectrum 36. Accordingly, if noise 9 would be filled into spectrum 36 so as to fill the zero-portions 40 thereof, in a spectrally flat manner, then the spectrum output by FDNS 6 would show within these portions 40 a noise floor which tends to increase from, for example, low to high frequencies. That is, when examining the whole spectrum or at least the portion of the spectrum bandwidth, where noise filling is performed, one would see that the noise within portions 40 has a tendency or linear regression function with positive slope or negative slope. As noise filling apparatus 30, however, fills spectrum 34 with noise exhibiting a spectrally global tilt of positive or negative slope, indicated a in
(84) “Spectrally global tilt” shall denote that the noise 9 filled into spectrum 34 has a level which tends to decrease (or increase) from low to high frequencies. For example, when placing a linear regression line through local maxima of noise 9 as filled into, for example, mutually spectrally distanced, contiguous spectral zero portions 40, the resulting linear regression line has the negative (or positive) slope α.
(85) Although not mandatory, the perceptual transform audio encoder's noise level computer may account for the tilted way of filling noise into spectrum 34 by measuring the level of the perceptually weighted spectrum 4 at portions 5 in a manner weighted with a spectrally global tilt having, for example, a positive slope in case of a being negative and negative slope if α is positive. The slope applied by the noise level computer, which is indicated as β in
(86) Later on it will be described that it may be feasible to control a variation of a slope of the spectrally global tilt a via explicit signaling in the data stream or via implicit signaling in that, for example, the noise filling apparatus 30 deduces the steepness from, for example, the spectral perceptual weighting function itself or from a transform window length switching. By the letter deduction, for example, the slope may be adapted to the window length.
(87) There are different manners feasible by way of which noise filling apparatus 30 causes the noise 9 to exhibit the spectrally global tilt.
(88) As will be described in more detail below, it would be feasible to adaptively set the portion of the whole spectrum within which noise filling is performed by noise filling apparatus 30.
(89) In connection with the embodiments outlined further below, according to which contiguous spectral zero-portions in spectrum 34, i.e. spectrum holes, are filled in a specific non-flat and tonality dependent manner, it will be explained that there are also alternatives for the multiplication 11 illustrated in
(90) All of the embodiments described above have in common that spectrum holes are avoided and that also concealing of tonal non-zero quantized lines is avoided. In the manner described above, the energy in noisy parts of a signal may be preserved and the adding of noise that masked tonal components is avoided in a manner described above.
(91) In the specific implementations described below, the part of the side information for performing the tonality dependent noise filling does not add anything to the existing side information of the codec where the noise filling is used. All information from the data stream that is used for the reconstruction of the spectrum, regardless of the noise filling, may also be used for the shaping of the noise filling.
(92) In accordance with an implementation example, the noise filling in noise filler 30 is performed as follows. All spectral lines above a noise filling start index that are quantized to zero are replaced with a non-zero value. This is done, for example, in a random or pseudorandom manner with spectrally constant probability density function or using patching from other spectral spectrogram locations (sources). See, for example,
(93) The inserted noise is shaped in the following steps: 1. In the residual domain or weighted domain. The shaping in the residual domain or weighted domain has been extensively described above with respect to
(94) The only additional side info needed for the noise filling is the level, which is transmitted using 3 bits, for example.
(95) When using FDNS there is no need to adapt it to a specific noise filling and it shapes the noise over the complete spectrum using smaller number of bits than the scale factors.
(96) A spectral tilt may be introduced in the inserted noise to counteract the spectral tilt from the pre-emphasis in the LPC-based perceptual noise shaping. Since the pre-emphasis represents a gentle high-pass filter applied to the input signal, the tilt compensation may counteract this by multiplying the equivalent of the transfer function of a subtle low-pass filter onto the inserted noise spectrum. The spectral tilt of this low-pass operation is dependent on the pre-emphasis factor and, advantageously, bit-rate and bandwidth. This was discussed referring to
(97) For each spectral hole, constituted from 1 or more consecutive zero-quantized spectral lines, the inserted noise may be shaped as depicted in
(98) The transition width is dependent on the tonality of the input signal. The tonality is obtained for each time frame. In
(99) The tonality measure of the spectrum may be based on the information available in the bitstream: LTP gain Spectrum rearrangement enabled flag (see [6]) TNS enabled flag
(100) The transition width is proportional to the tonality—small for noise like signals, big for very tonal signals.
(101) In an embodiment, the transition width is proportional to the LTP gain if the LTP gain>0. If the LTP gain is equal to 0 and the spectrum rearrangement is enabled then the transition width for the average LTP gain is used. If the TNS is enabled then there is no transition area, but the full noise filling should be applied to all zero-quantized spectral lines. If the LTP gain is equal to 0 and the TNS and the spectrum rearrangement are disabled, a minimum transition width is used.
(102) If there is no tonality information in the bitstream a tonality measure may be calculated on the decoded signal without the noise filling. If there is no TNS information, a temporal flatness measure may be calculated on the decoded signal. If, however, TNS information is available, such a flatness measure may be derived from the TNS filter coefficients directly, e.g. by computing the filter's prediction gain.
(103) In the encoder, the noise filling level may be calculated by taking the transition width into account. Several ways to determine the noise filling level from the quantized spectrum are possible. The simplest is to sum up the energy (square) of all lines of the normalized input spectrum in the noise filling region (i.e. above iStart) which were quantized to zero, then to divide this sum by the number of such lines to obtain the average energy per line, and to finally compute a quantized noise level from the square root of the average line energy. In this way, the noise level is effectively derived from the RMS of the spectral components quantized to zero. Let, for example, A be the set of indices i of spectral lines where the spectrum has been quantized to zero and which belong to any of the zero-portions, e.g. is above start frequency, and let N denote the global noise scaling factor. The values of the spectrum as not yet quantized shall be denoted y.sub.i. Further, left(i) shall be a function indicating for any zero-quantized spectral value at index i the index of the zero-quantized value at the low-frequency end of the zero-portion to which i belongs, and F.sub.i(j) with j=0 to J.sub.i−1 shall denote the function assigned to, depending on the tonality, the zero-portion starting at index i, with J.sub.i indicating the width of that zero-portion. Then, N may be determined by N=sqrt(Σ.sub.iεAy.sub.i.sup.2/cardinality(A)).
(104) In the embodiment, the individual hole sizes as well as the transition width are considered. To this end, runs of consecutive zero-quantized lines are grouped into hole regions. Each normalized input spectral line in a hole region, i.e. each spectral value of the original signal at a spectral position within any contiguous spectral zero-portion, is then scaled by the transition function, as described in the previous section, and subsequently the sum of the energies of the scaled lines is calculated. Like in the previous simple embodiment, the noise filling level can then be computed from the RMS of the zero-quantized lines. Applying the above nomenclature, N may be computed as by N=sqrt(Σ.sub.iεA(F.sub.left(i)(i−left(i)).Math.y.sub.i).sup.2/cardinality(A)).
(105) A problem with this approach, however, is that the spectral energy in small hole regions (i.e. regions with a width of much less than twice the transition width) is underestimated since in the RMS calculation, the number of spectral lines in the sum by which the energy sum is divided is unchanged. In other words, when the quantized spectrums exhibits mostly many small hole regions, the resulting noise filling level will be lower than when the spectrum is sparse and has only a few long hole regions. To ensure that in both of these cases a similar noise level is found, it is therefore advantageous to adapt the line-count used in the denominator of the RMS computation to the transition width. Most importantly, if a hole region size is smaller than twice the transition width, the number of spectral lines in that hole region is not counted as-is, i.e. as an integer number of lines, but as a fractional line-number which is less than the integer line-number. In the above formula concerning N, for example, the “cardinality(A)” would be replaced by a smaller number depending on the number of “small” zero-portions.
(106) Furthermore, the compensation of the spectral tilt in the noise filling due to the LPC-based perceptual coding should also be taken into account during the noise level calculation. More specifically, the inverse of the decoder-side noise filling tilt compensation is applied to the original unquantized spectral lines which were quantized to zero, before the noise level is computed. In the context of LPC-based coding employing pre-emphasis, this implies that higher-frequency lines are amplified slightly with respect to lower-frequency lines prior to the noise level estimation. Applying the above nomenclature, N may be computed as by N=sqrt(Σ.sub.iεA(F.sub.left(i)(i−left(i)).Math.LPF(i).sup.−1.Math.y.sub.i).sup.2/cardinality(A)). As mentioned above, depending on the circumstances, the function LPF which corresponds to function 15 may have a positive slope and LPF changed to read HPF accordingly. It is briefly noted that in all above formulae using “LPF”, setting F.sub.left to a constant function such as to be all one, would reveal a way how to apply the concept of subjecting the moise to be filled into the spectrum 34 with a spectrally global tilt without the tonality-dependent hole filling.
(107) The possible computations of N may be performed in the encoder such as, for example, in 108 or 154.
(108) Finally, it was found that when harmonics of a very tonal, stationary signal were quantized to zero, the lines representing these harmonics lead to a relatively high or unstable (i.e. time-fluctuating) noise level. This artifact can be reduced by using in the noise level calculation the average magnitude of zero-quantized lines instead of their RMS. While this alternative approach does not guarantee that the energy of the noise filled lines in the decoder reproduces the energy of the original lines in the noise filling regions, it does ensure that spectral peaks in the noise filling regions have only limited contribution to the overall noise level, thereby reducing the risk of overestimation of the noise level.
(109) Finally, it is noted that an encoder may even be configured to perform the noise filling completely in order to keep itself in line with the decoder such as, for example, for analysis by synthesis purposes.
(110) Thus, the above embodiment, inter alias, describes a signal adaptive method for replacing the zeros introduced in the quantization process with spectrally shaped noise. A noise filling extension for an encoder and a decoder are described that fulfill the abovementioned requirements by implementing the following: Noise filling start index may be adapted to the result of the spectrum quantization but limited to a certain range A spectral tilt may be introduced in the inserted noise to counteract the spectral tilt from the perceptual noise shaping All zero-quantized lines above the noise filling start index are replaced with noise By means of a transition function, the inserted noise is attenuated close to the spectral lines not quantized to zero The transition function is dependent on the instantaneous characteristics of the input signal The adaptation of the noise filling start index, the spectral tilt and the transition function may be based on the information available in the decoder
There is no need for additional side information, except for a noise filling level
(111) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
(112) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(113) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(114) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
(115) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(116) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(117) A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
(118) A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
(119) A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
(120) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(121) A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
(122) In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
(123) The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
(124) The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
(125) While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
(126) [1] B. G. G. F. S. G. M. M. H. P. J. H. S. W. G. S. J. H. Nikolaus Rettelbach, “Noise Filler, Noise Filling Parameter Calculator Encoded Audio Signal Representation, Methods and Computer Program”. Patent US 2011/0173012 A1. [2] Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec, 3GPP TS 26.290 V6.3.0, 2005-2006. [3] B. G. G. F. S. G. M. M. H. P. J. H. S. W. G. S. J. H. Nikolaus Rettelbach, “Audio encoder, audio decoder, methods for encoding and decoding an audio signal, audio stream and computer program”. Patent WO 2010/003556 A1. [4] M. M. N. R. G. F. J. R. J. L. S. W. S. B. S. D. C. H. R. L. P. G. B. B. J. L. K. K. H. Max Neuendorf, “MPEG Unified Speech and Audio Coding—The ISO/MPEG Standard for High-Efficiency Audio Coding of all Content Types,” in 132nd Convertion AES, Budapest, 2012. Also appears in the Journal of the AES, vol. 61, 2013. [5] M. M. M. N. a. R. G. Guillaume Fuchs, “MDCT-Based Coder for Highly Adaptive Speech and Audio Coding,” in 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, 2009. [6] H. Y. K. Y. M. T. Harada Noboru, “Coding Mmethod, Decoding Method, Coding Device, Decoding Device, Program, and Recording Medium”. Patent WO 2012/046685 A1.