AUDIO ENCODER AND DECODER USING A FREQUENCY DOMAIN PROCESSOR WITH FULL-BAND GAP FILLING AND A TIME DOMAIN PROCESSOR
20170256267 · 2017-09-07
Inventors
- Sascha Disch (Furth, DE)
- Martin Dietz (Nurnberg, DE)
- Markus MULTRUS (Nurnberg, DE)
- Guillaume Fuchs (Bubenreuth, DE)
- Emmanuel Ravelli (Erlangen, DE)
- Matthias Neusinger (Rohr, DE)
- Markus SCHNELL (Nurnberg, DE)
- Benjamin SCHUBERT (Nurnberg, DE)
- Bernhard GRILL (Ruckersdorf, DE)
Cpc classification
G10L19/20
PHYSICS
G10L19/06
PHYSICS
G10L19/02
PHYSICS
G10L19/265
PHYSICS
G10L19/24
PHYSICS
International classification
G10L19/06
PHYSICS
Abstract
An audio encoder for encoding an audio signal has: a first encoding processor for encoding a first audio signal portion in a frequency domain, having: a time frequency converter for converting the first audio signal portion into a frequency domain representation; an analyzer for analyzing the frequency domain representation to determine first spectral portions to be encoded with a first spectral resolution and second regions to be encoded with a second resolution; and a spectral encoder for encoding the first spectral portions with the first spectral resolution and encoding the second portions with the second resolution; a second encoding processor for encoding a second different audio signal portion in the time domain; a controller for analyzing and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal having a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second portion.
Claims
1. An audio encoder for encoding an audio signal, comprising: a first encoding processor for encoding a first audio signal portion in a frequency domain, wherein the first encoding processor comprises: a time frequency converter for converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; an analyzer for analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzer is configured to determine a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; a spectral encoder for encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the spectral encoder comprises a parametric coder for calculating spectral envelope information comprising the second spectral resolution from the second spectral portions; a second encoding processor for encoding a second different audio signal portion in the time domain, wherein the second encoding processor comprises: a sampling rate converter for converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise the high band of the input signal; a time domain low band encoder for time domain encoding the lower sampling rate representation; and a time domain bandwidth extension encoder for parametrically encoding the high band; a controller configured for analyzing the audio signal and for determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and an encoded signal former for forming an encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion.
2. The audio encoder of claim 1, further comprising: a preprocessor configured for preprocessing the first audio signal portion and the second audio signal portion, wherein the preprocessor comprises: a prediction analyzer for determining prediction coefficients; and wherein the second encoding processor comprises: a prediction coefficient quantizer for generating a quantized version of the prediction coefficients; and an entropy coder for generating an encoded version of the quantized prediction coefficients, wherein the encoded signal former is configured for introducing the encoded version into the encoded audio signal.
3. The audio encoder of claim 1, wherein a preprocessor comprises a resampler for resampling the audio signal to a sampling rate of the second encoding processor; and wherein a prediction analyzer is configured to determine the prediction coefficients using a resampled audio signal, or wherein the preprocessor further comprises a long term prediction analysis stage for determining one or more long term prediction parameters for the first audio signal portion.
4. The audio encoder of claim 1, further comprising a cross-processor for calculating, from the encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, so that the second encoding processing is initialized to encode the second audio signal portion immediately following the first audio signal portion in time in the audio signal.
5. The audio encoder of claim 4, wherein the cross-processor comprises: a spectral decoder for calculating a decoded version of the first encoded signal portion; a delay stage for feeding a delayed version of the decoded version into a de-emphasis stage of the second encoding processor for initialization; a weighted prediction coefficient analysis filtering block for feeding a filter output into a codebook determinator of the second encoding processor for initialization; an analysis filtering stage for filtering the decoded version or a pre-emphasized version and for feeding a filter residual into an adaptive codebook determinator of the second encoding processor for initialization; or a pre-emphasis filter for filtering the decoded version and for feeding a delayed or pre-emphasized version to a synthesis filtering stage of the second encoding processor for initialization.
6. The audio encoder of claim 1, wherein the analyzer is configured to perform a temporal tile shaping or temporal noise shaping analysis or an operation of setting to zero spectral values in the second spectral portions, wherein the first encoding processor is configured to perform a shaping of spectral values of the first spectral portions using prediction coefficients derived from the first audio signal portion, and wherein the first encoding processor is furthermore configured to perform a quantization and entropy coding operation of shaped spectral values of the first spectral portions, and wherein spectral values of the second spectral portions are set to zero.
7. The audio encoder of claim 6, further comprising a cross-processor, wherein the cross-processor comprises: a noise shaper for shaping quantized spectral values of the first spectral portions using LPC coefficients derived from the first audio signal portion; a spectral decoder for decoding the spectrally shaped spectral portions of the first spectral portion with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation; a frequency-time converter for converting the spectral representation into a time domain to acquire a decoded first audio signal portion, wherein a sampling rate associated with the decoded first audio signal portion is different than a sampling rate of the audio signal, and a sampling rate associated with an output signal of the frequency-time converter is different from a sampling rate of the audio signal input into the frequency-time converter.
8. The audio encoder of claim 1, wherein the second encoding processor comprises at least one block of the following group of blocks: a prediction analysis filter; an adaptive codebook stage; an innovative codebook stage; an estimator for estimating an innovative codebook entry; an ACELP/gain coding stage; a prediction synthesis filtering stage; a de-emphasis stage; and a bass post-filter analysis stage.
9. The audio encoder of claim 1, wherein the time domain encoding processor comprises an associated second sampling rate, wherein the frequency domain encoding processor has associated therewith a first sampling rate being higher than the second sampling rate, wherein the audio encoder further comprises a cross-processor for calculating, from the encoded spectral representation of the first audio signal portion, initialization data of the second encoding processor, wherein the cross-processor comprises a frequency-time converter for generating a time domain signal at the second sampling rate, wherein the frequency time converter comprises: a selector for selecting a low portion of a spectrum input into the frequency time converter in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1, a transform processor comprising a transform length being smaller than a transform length of the time-frequency converter; and a synthesis windower for windowing using a window comprising a smaller number of window coefficients compared to a window used by the time frequency converter.
10. An audio decoder for decoding an encoded audio signal, comprising: a first decoding processor for decoding a first encoded audio signal portion in a frequency domain, the first decoding processor comprising: a spectral decoder for decoding first spectral portions with a high spectral resolution and for synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein the spectral decoder is configured to generate the first decoded representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and a frequency-time converter for converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; a second decoding processor for decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding processor comprises: a time domain low band decoder for decoding a low band time domain signal; an upsampler for upsampling the low band time domain signal; a time domain bandwidth extension decoder for synthesizing a high band of a time domain output signal; and a mixer for mixing a synthesized high band of the time domain signal and an upsampled low band time domain signal; and a combiner for combining the decoded first spectral portion and the decoded second spectral portion to acquire a decoded audio signal.
11. The audio decoder of claim 10, wherein the upsampler comprises an analysis filterbank operating at a first time domain low band decoder sampling rate and a synthesis filterbank operating at a second output sampling rate being higher than the first time domain low band sampling rate.
12. The audio decoder of claim 10, wherein the time domain low band decoder comprises a residual signal, a decoder and a synthesis filter for filtering a residual signal using synthesis filter coefficients, wherein the time domain bandwidth extension decoder is configured to upsample the residual signal and to process an upsampled residual signal using a non-linear operation to acquire a high band residual signal, and to spectrally shape the high band residual signal to acquire the synthesized high band.
13. The audio decoder of claim 10, wherein the first decoding processor comprises an adaptive long term prediction post-filter for post-filtering the first decoded first signal portion, wherein the filter is controlled by one or more long term prediction parameters comprised in the encoded audio signal.
14. The audio decoder of claim 10, further comprising: a cross-processor for calculating, from the decoded spectral representation of the first encoded audio signal portion, initialization data of the second decoding processor, so that the second decoding processor is initialized to decode the encoded second audio signal portion following in time the first audio signal portion in the encoded audio signal.
15. The audio decoder of claim 14, wherein the cross-processor further comprises: a frequency-time converter operating at a lower sampling rate than the frequency-time converter of the first decoding processor to acquire a further decoded first signal portion in the time domain, wherein the signal output by the frequency-time converter comprises a second sampling rate being lower than the first sampling rate associated with an output of the frequency-time converter of the second decoding processor, wherein the additional frequency-time converter comprises a selector for selecting a low portion of a spectrum input into the additionally frequency-time converter in accordance with a ratio of the first sampling rate and the second sampling rate, the ratio being smaller than 1; a transform processor comprising a transform length being smaller than a transform length of the time-frequency converter; and a synthesis windower using a window comprising a smaller number of coefficients compared to a window used by the frequency-time converter.
16. The audio decoder of claim 14, wherein the cross-processor comprises: a delay stage for delaying the further decoded first signal portion and for feeding a delayed version of the decoded first signal portion into a de-emphasis stage of the second decoding processor for initialization; a pre-emphasis filter and a delay stage for filtering and delaying the further decoded first signal portion and for feeding a delay stage output into a prediction synthesis filter of the second decoding processor for initialization; a prediction analysis filter for generating a prediction residual signal from the further decoded first spectral portion or a pre-emphasized further decoded first signal portion and for feeding a prediction residual signal into a codebook synthesizer of the second decoding processor; or a switch for feeding the further decoded first signal portion into an analysis stage of a resampler of the second decoding processor for initialization.
17. The audio decoder of claim 10, wherein the second decoding processor comprises at least one block of the group of blocks comprising: an ACELP for decoding gains and an innovative codebook; an adaptive codebook synthesis stage; an ACELP post-processor; a prediction synthesis filter; and a de-emphasis stage.
18. A method of encoding an audio signal, comprising: first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises: converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution; second encoding a second different audio signal portion in the time domain wherein the second encoding comprises: converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise the high band of the input signal; time domain encoding the lower sampling rate representation; and parametrically encoding the high band; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming an encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion.
19. A method of decoding an encoded audio signal, comprising: first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising: decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the first decoded representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises: decoding a low band time domain signal; upsampling the low band time domain signal; synthesizing a high band of a time domain output signal; and mixing a synthesized high band of the time domain signal and an upsampled low band time domain signal; and combining the decoded first spectral portion and the decoded second spectral portion to acquire a decoded audio signal.
20. A non-transitory digital storage medium having stored thereon a computer program for performing a method of encoding an audio signal, comprising: first encoding a first audio signal portion in a frequency domain, wherein the first encoding comprises: converting the first audio signal portion into a frequency domain representation comprising spectral lines up to a maximum frequency of the first audio signal portion; analyzing the frequency domain representation up to the maximum frequency to determine first spectral portions to be encoded with a first spectral resolution and second spectral portions to be encoded with a second spectral resolution, the second spectral resolution being lower than the first spectral resolution, wherein the analyzing determines a first spectral portion from the first spectral portions, the first spectral portion being placed, with respect to frequency, between two second spectral portions from the second spectral portions; encoding the first spectral portions with the first spectral resolution and for encoding the second spectral portions with the second spectral resolution, wherein the encoding the second spectral portion comprises calculating, from the second spectral portions, spectral envelope information comprising the second spectral resolution; second encoding a second different audio signal portion in the time domain wherein the second encoding comprises: converting the second audio signal portion to a lower sampling rate representation, the lower sampling rate being lower than a sampling rate of the audio signal, wherein the lower sampling rate representation does not comprise the high band of the input signal; time domain encoding the lower sampling rate representation; and parametrically encoding the high band; analyzing the audio signal and determining, which portion of the audio signal is the first audio signal portion encoded in the frequency domain and which portion of the audio signal is the second audio signal portion encoded in the time domain; and forming an encoded audio signal comprising a first encoded signal portion for the first audio signal portion and a second encoded signal portion for the second audio signal portion, when said computer program is run by a computer.
21. A non-transitory digital storage medium having stored thereon a computer program for performing a method of decoding an encoded audio signal, comprising: first decoding a first encoded audio signal portion in a frequency domain, the first decoding comprising: decoding first spectral portions with a high spectral resolution and synthesizing second spectral portions using a parametric representation of the second spectral portions and at least a decoded first spectral portion to acquire a decoded spectral representation, wherein decoding comprises generating the first decoded representation so that a first spectral portion is placed with respect to frequency between two second spectral portions; and converting the decoded spectral representation into a time domain to acquire a decoded first audio signal portion; second decoding a second encoded audio signal portion in the time domain to acquire a decoded second audio signal portion, wherein the second decoding comprises: decoding a low band time domain signal; upsampling the low band time domain signal; synthesizing a high band of a time domain output signal; and mixing a synthesized high band of the time domain signal and an upsampled low band time domain signal; and combining the decoded first spectral portion and the decoded second spectral portion to acquire a decoded audio signal, when said computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0054] Embodiments of the present invention will subsequently be discussed with respect to the accompanying drawings in which:
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
DETAILED DESCRIPTION OF THE INVENTION
[0079]
[0080] The audio encoder of
[0081] Hence, the controller 620 makes sure that for a single audio signal portion only a time domain representation or a frequency domain representation is in the encoded signal. This can be accomplished by the controller 620 in several ways. One way would be that, for one and the same audio signal portion, both representations arrive at block 630 and the controller 620 controls the encoded signal former 630 to only introduce one of both representations into the encoded signal. Alternatively, however, the controller 620 can control an input into the first encoding processor and an input into the second encoding processor so that, based on the analysis of the corresponding signal portion, only one of both blocks 600 or 610 is activated to actually perform the full encoding operation and the other block is deactivated.
[0082] This deactivation can be a deactivation or, as illustrated with respect to, for example,
[0083] In the further specific implementation of the second encoding processor operating in the time domain, the second encoding processor comprises a downsampler 900 or sampling rate converter for converting the audio signal portion into a representation with a lower sampling rate, wherein the lower sampling rate is lower than a sampling rate at the input into the first encoding processor. This is illustrated in
[0084] In a further embodiment of the present invention the audio encoder additionally comprises, although not illustrated in
[0085] Furthermore, the preprocessor additionally comprises an entropy coder for generating an encoded version of the quantized prediction coefficients. It is important to note that the encoded signal former 630 or the specific implementation, i.e., the bit stream multiplexor 613 makes sure that the encoded version of the quantized prediction coefficients is included into the encoded audio signal 632. Advantageously, the LPC coefficients are not directly quantized but are converted into an ISF, for example, or any other representation better suited for quantization. This conversion may be performed either by the determined LPC coefficients block 1002 or is performed within the block 1010 for quantizing the LPC coefficients.
[0086] Furthermore, the preprocessor may comprise a resampler 1004 for resampling an audio input signal at an input sampling rate into a lower sampling rate for the time domain encoder. When the time domain encoder is an ACELP encoder having a certain ACELP sampling rate then the down sampling is performed to advantageously either 12.8 kHz or 16 kHz. The input sampling rate can be any of a particular number of sampling rates such as 32 kHz or an even higher sampling rate. On the other hand, the sampling rate of the time domain encoder will be predetermined by certain restrictions and the resampler 1004 performs this resampling and outputs the lower sampling rate representation of the input signal. Hence, the resampler 1004 can perform a similar functionality and can even be one and the same element as the downsampler 900 illustrated in the context of
[0087] Furthermore, it is of advantage to apply a pre-emphasis in the pre-emphasis block 1005 in
[0088] Furthermore, the preprocessor may additionally comprise a TCX-LTP parameter extraction for controlling an LTP post filter illustrated at 1420 in
[0089] As illustrated, the result of block 1006 is input into the encoded signal, i.e., is in the embodiment of
[0090] Hence, to summarize, common to both paths is a preprocessing operation 1000 in which commonly used signal processing operations are performed. These comprise a resampling to an ACELP sampling rate (12.8 or 16 kHz) for one parallel path and this resampling is performed. Furthermore, a TCX LTP parameter extraction illustrated at block 1006 is performed and, additionally, a pre-emphasis and a determination of LPC coefficients is performed. As outlined, the pre-emphasis compensates for the spectral tilt and, therefore, makes the calculation of LPC parameters at a given LPC order more efficient.
[0091] Subsequently, reference is made to
[0092] Based on this audio signal portion, the controller 620 addresses a frequency domain encoder simulator 621 and a time domain encoder simulator 622 in order to calculate for each encoder possibility an estimated signal to noise ratio. Subsequently, the selector 623 selects the encoder which has provided the better signal to noise ratio, naturally under the consideration of a predefined bit rate. The selector then identifies the corresponding encoder via the control output. When it is determined that the audio signal portion under consideration is to be encoded using the frequency domain encoder, the time domain encoder is set into an initialization state or in other embodiments not requiring a very instant switching in a completely deactivated state. However, when it is determined that the audio signal portion under consideration is to be encoded by the time domain encoder, the frequency domain encoder is then deactivated.
[0093] Subsequently, an implementation of the controller illustrated in
[0094] In case the TCX branch is chosen, a TCX decoder is run in each frame which outputs a signal at the ACELP sampling rate. This is used to update the memories used for the ACELT encoding path (LPC residual, Mem w0, Memory deemphasis), to enable instant switching from TCX to ACELP. The memory update is performed in each TCX path.
[0095] Alternatively, a full analysis by synthesis process can performed, i.e., both encoder simulators 621, 622 implement the actual encoding operations and the results are compared by the selector 623. Alternatively, again, a complete feed forward calculation can be done by performing a signal analysis. For example, when it is determined that the signal is a speech signal by a signal classifier the time domain encoder is selected and when it is determined that the signal is a music signal then the frequency domain encoder is selected. Other procedures in order to distinguish between both encoders based on a signal analysis of the audio signal portion under consideration can also be applied.
[0096] Advantageously, the audio encoder additionally comprises a cross-processor 700 illustrated in
[0097] Hence, the time domain encoder 610 is configured to be initialized by the initialization data in order to encode an audio signal portion following an earlier audio signal portion encoded by the frequency domain encoder 600 in an efficient manner.
[0098] In particular, the cross-processor comprises a time converter for converting a frequency domain representation into a time domain representation which can be forwarded to the time domain encoder directly or after some further processing. This converter is illustrated in
[0099] The ratio of the time domain coder sampling rate or ACELP sampling rate and the frequency domain coder sampling rate or input sampling rate can be calculated and is a downsampling factor DS illustrated in
[0100] This low frequency portion of the full-band spectrum is input into a small size transform and foldout block 720, as illustrated in
[0101] Thus, a very efficient downsampling operation can be applied since the downsampling is included in the IMDCT implementation. In this context, it is emphasized that the block 702 can be implemented by an IMDCT but can also be implemented by any other transform or filterbank implementation which can be suitably sized in the actual transform kernel and other transform related operations.
[0102] In a further embodiment illustrated in
[0103] Furthermore, the frequency domain encoder may comprise a noise shaping block 606a. The noise shaping block 606a is controlled by quantized LPC coefficients as generated by block 1010. The quantized LPC coefficients used for noise shaping 606a perform a spectral shaping of the high resolution spectral values or spectral lines directly encoded (rather than parametrically encoded) and the result of block 606a is similar to the spectrum of a signal subsequent to an LPC filtering stage operating in the time domain such as an LPC analysis filtering block 704 to be described later on. Furthermore, the result of the noise shaping block 606a is then quantized and entropy coded as indicated by block 606b. The result of block 606b corresponds to the encoded first audio signal portion or a frequency domain coded audio signal portion (together with other side information).
[0104] The cross-processor 700 comprises a spectral decoder for calculating a decoded version of the first encoded signal portion. In the embodiment of
[0105] Furthermore, the cross-processor 17 may comprise in addition or alternatively a weighted prediction coefficient analysis filtering stage 708 for filtering the decoded version and for feeding a filtered decoded version to a codebook determinator 613 indicated as “MMSE” in
[0106] The time domain encoder processor 610 comprises, as illustrated in
[0107] Furthermore, an ACELP gains/coding stage 612 is provided in series to the innovative codebook stage 614 and the result of this block is input into a codebook determinator 613 indicated as MMSE in
[0108] As illustrated, several blocks of the time domain decoder depend on previous signals and these blocks are the adaptive codebook block, the codebook determinator 613, the LPC synthesis filtering block 616 and the de-emphasis block 617. These blocks are provided with data from the cross-processor derived from the frequency domain encoding processor data in order to initialize these blocks for the purpose of being ready for an instant switch from the frequency domain encoder to the time domain encoder. As can also be seen from
[0109] An embodiment of an audio encoder therefore comprises the following parts:
[0110] The audio decoder is described in the following: The waveform decoder part consists of a full-band TCX decoder path with IGF both operating at the input sampling rate of the codec. In parallel, an alternative ACELP decoder path at lower sampling rate exists that is reinforced further downstream by a TD-BWE.
[0111] For ACELP initialization when switching from TCX to ACELP, a cross path (consisting of a shared TCX decoder frontend but additionally providing output at the lower sampling rate and some post-processing) exists that performs the inventive ACELP initialization. Sharing the same sampling rate and filter order between TCX and ACELP in the LPCs allows for an easier and more efficient ACELP initialization.
[0112] For visualizing the switching, two switches are sketched in 14B. While the second switch downstream chooses between TCX/IGF or ACELP/TD-BWE output, the first switch either pre-updates the buffers in the resampling QMF stage downstream the ACELP path by the output of the cross path or simply passes on the ACELP output.
[0113] Subsequently, audio decoder implementations in accordance with aspects of the present invention are discussed in the context of
[0114] An audio decoder for decoding an encoded audio signal 1101 comprises a first decoding processor 1120 for decoding a first encoded audio signal portion in a frequency domain. The first decoding processor 1120 comprises a spectral decoder 1122 for decoding first spectral regions with a high spectral resolution and for synthesizing second spectral regions using a parametric representation of the second spectral regions and at least a decoded first spectral region to obtain a decoded spectral representation. The decoded spectral representation is a full-band decoded spectral representation as discussed in the context of
[0115] Furthermore, the audio decoder comprises a second decoding processor 1140 for decoding the second encoded audio signal portion in the time domain to obtain a decoded second signal portion. Furthermore, the audio decoder comprises a combiner 1160 for combining the decoded first signal portion and the decoded second signal portion to obtain a decoded audio signal. The decoded signal portions are combined in sequence which is also illustrated in
[0116] Advantageously, the second decoding processor 1140 is a time domain bandwidth extension processor and comprises, as illustrated in
[0117]
[0118] Subsequently, an implementation of the upsampler 1210 of
[0119] Further processing operations can be performed within the QMF domain in addition or instead of the bandpass filtering 1472. If no processing is performed at all, then the QMF analysis and the QMF synthesis constitute an efficient upsampler 1210.
[0120] Subsequently, the construction of the individual elements in
[0121] The full-band frequency domain decoder 1120 comprises a first decoding block 1122a for decoding the high resolution spectral coefficients and for additionally performing noise filling in the low band portion as known, for example, from the USAC technology. Furthermore, the full-band decoder comprises an IGF processor 1122b for filling the spectral holes using synthesized spectral values which have been only parametrically and, therefore, encoded with a low resolution on the encoder-side. Then, in block 1122c, an inverse noise shaping is performed and the result is input into a TNS/TTS synthesis block 705 which provides, as a final output, an input to a frequency-time converter 1124, which may be implemented as an inverse modified discrete cosine transform operating at the output, i.e., high sampling rate.
[0122] Furthermore, a harmonic or LTP post-filter is used which is controlled by data obtained by the TCX LTP parameter extraction block 1006 in
[0123] Several elements in
[0124] The time domain decoding processor 1140 may comprise the ACELP or time domain low band decoder 1200 comprising an ACELP decoder stage 1149 for obtaining decoded gains and the innovative codebook information. Additionally, an ACELP adaptive codebook stage 1141 is provided and a subsequent ACELP post-processing stage 1142 and a final synthesis filter such as LPC synthesis filter 1143, which is again controlled by the quantized LPC coefficients 1145 obtained from the bitstream demultiplexer 1100 corresponding to the encoded signal parser 1100 in
[0125] In accordance with embodiments of the present invention, the audio decoder additionally comprises the cross-processor 1170 illustrated in
[0126] Advantageously, the cross-processor 1170 comprises an additional frequency-time converter 1171 operating at a lower sampling rate than the frequency-time converter of the first decoding processor in order to obtain a further decoded first signal portion in the time domain to be used as the initialization signal or for which any initialization data can be derived. Advantageously, this IMDCT or low sampling rate frequency-time converter is implemented as illustrated in
[0127] As illustrated in
[0128] Furthermore, the cross-processor may comprise alternatively or in addition to the other mentioned elements an LPC analysis filter 1174 for generating a prediction residual signal from the further decoded first signal portion or a pre-emphasized further decoded first signal portion and for feeding the data into a codebook synthesizer of the second decoding processor and advantageously, into the adaptive codebook stage 1141. Furthermore, the output of the frequency-time converter 1171 with the low sampling rate is also input into the QMF analysis stage 1471 of the upsampler 1210 for the purpose of initialization, i.e., when the currently decoded audio signal portion is delivered by the frequency domain full-band decoder 1120.
[0129] The audio decoder is described in the following: The waveform decoder part consists of a full-band TCX decoder path with IGF both operating at the input sampling rate of the codec. In parallel, an alternative ACELP decoder path at lower sampling rate exists that is reinforced further downstream by a TD-BWE.
[0130] For ACELP initialization when switching from TCX to ACELP, a cross path (consisting of a shared TCX decoder frontend but additionally providing output at the lower sampling rate and some post-processing) exists that performs the inventive ACELP initialization. Sharing the same sampling rate and filter order between TCX and ACELP in the LPCs allows for an easier and more efficient ACELP initialization.
[0131] For visualizing the switching, two switches are sketched in
[0132] To summarize, advantageous aspects of the invention which can be used alone or in combination relate to a combination of an ACELP and TD-BWE coder with a full-band capable TCX/IGF technology advantageously associated with using a cross signal.
[0133] A further specific feature is a cross signal path for the ACELP initialization to enable seamless switching.
[0134] A further aspect is that a short IMDCT is fed with a lower part of high-rate long MDCT coefficients to efficiently implement a sample rate conversion in the cross-path.
[0135] A further feature is an efficient realization of the cross-path partly shared with a full-band TCX/IGF in the decoder.
[0136] A further feature is the cross signal path for the QMF initialization to enable seamless switching from TCX to ACELP.
[0137] An additional feature is a cross-signal path to the QMF allowing compensating the delay gap between ACELP resampled output and a filterbank-TCX/IGF output when switching from ACELP to TCX.
[0138] A further aspect is that an LPC is provided for both the TCX and the ACELP coder at the same sampling rate and filter order, although the TCX/IGF encoder/decoder is full-band capable.
[0139] Subsequently,
[0140] Generally, the time domain decoder comprises an ACELP decoder, a subsequently connected resampler or upsampler and a time domain bandwidth extension functionality. Particularly, the ACELP decoder comprises an ACELP decoding stage for restoring gains and the innovative codebook 1149, an ACELP-adaptive codebook stage 1141, an ACELP post-processor 1142, an LPC synthesis filter 1143 controlled by quantized LPC coefficients from a bitstream demultiplexer or encoded signal parser and the subsequently connected de-emphasis stage 1144. Advantageously, the time domain residual signal being at an ACELP sampling rate is input into a time domain bandwidth extension decoder 1220 which provides a high band at the outputs.
[0141] In order to upsample the de-emphasis 1144 output, an upsampler comprising the QMF analysis block 1471, and the QMF synthesis block 1473 are provided. Within the filterbank domain defined by blocks 1471 and 1473, a bandpass filter may be applied. Particularly, as has been discussed before, the same functionalities can also be used which have been discussed with respect to the same reference numbers. Furthermore, the time domain bandwidth extension decoder 1220 can be implemented as illustrated in
[0142] Subsequently, further details with respect to the frequency domain encoder and decoder being full-band capable are discussed with respect to
[0143]
[0144] Typically, a first spectral portion such as 306 of
[0145]
[0146] The decoder further comprises a frequency regenerator 116 for regenerating a reconstructed second spectral portion having the first spectral resolution using a first spectral portion. The frequency regenerator 116 performs a tile filling operation, i.e., uses a tile or portion of the first set of first spectral portions and copies this first set of first spectral portions into the reconstruction range or reconstruction band having the second spectral portion and typically performs spectral envelope shaping or another operation as indicated by the decoded second representation output by the parametric decoder 114, i.e., by using the information on the second set of second spectral portions. The decoded first set of first spectral portions and the reconstructed second set of spectral portions as indicated at the output of the frequency regenerator 116 on line 117 is input into a spectrum-time converter 118 configured for converting the first decoded representation and the reconstructed second spectral portion into a time representation 119, the time representation having a certain high sampling rate.
[0147]
[0148] The spectral analyzer/tonal mask 226 separates the output of TNS block 222 into the core band and the tonal components corresponding to the first set of first spectral portions 103 and the residual components corresponding to the second set of second spectral portions 105 of
[0149] Advantageously, the analysis filterbank 222 is implemented as an MDCT (modified discrete cosine transform filterbank) and the MDCT is used to transform the signal 99 into a time-frequency domain with the modified discrete cosine transform acting as the frequency analysis tool.
[0150] The spectral analyzer 226 may apply a tonality mask. This tonality mask estimation stage is used to separate tonal components from the noise-like components in the signal. This allows the core coder 228 to code all tonal components with a psycho-acoustic module. The tonality mask estimation stage can be implemented in numerous different ways and may be implemented similar in its functionality to the sinusoidal track estimation stage used in sine and noise-modeling for speech/audio coding [8, 9] or an HILN model based audio coder described in [10]. Advantageously, an implementation is used which is easy to implement without the need to maintain birth-death trajectories, but any other tonality or noise detector can be used as well.
[0151] The IGF module calculates the similarity that exists between a source region and a target region. The target region will be represented by the spectrum from the source region. The measure of similarity between the source and target regions is done using a cross-correlation approach. The target region is split into nTar non-overlapping frequency tiles. For every tile in the target region, nSrt source tiles are created from a fixed start frequency. These source tiles overlap by a factor between 0 and 1, where 0 means 0% overlap and 1 means 100% overlap. Each of these source tiles is correlated with the target tile at various lags to find the source tile that best matches the target tile. The best matching tile number is stored in tileNum[idx_tar], the lag at which it best correlates with the target is stored in xcorr_lag[idx_tar][idx_src] and the sign of the correlation is stored in xcorr_sign[idx_tar][idx_src]. In case the correlation is highly negative, the source tile needs to be multiplied by −1 before the tile filling process at the decoder. The IGF module also takes care of not overwriting the tonal components in the spectrum since the tonal components are preserved using the tonality mask. A band-wise energy parameter is used to store the energy of the target region enabling us to reconstruct the spectrum accurately.
[0152] This method has certain advantages over the classical SBR [1] in that the harmonic grid of a multi-tone signal is preserved by the core coder while only the gaps between the sinusoids is filled with the best matching “shaped noise” from the source region. Another advantage of this system compared to ASR (Accurate Spectral Replacement) [2-4] is the absence of a signal synthesis stage which creates the important portions of the signal at the decoder. Instead, this task is taken over by the core coder, enabling the preservation of important components of the spectrum. Another advantage of the proposed system is the continuous scalability that the features offer. Just using tileNum[idx_tar] and xcorr_lag=0, for every tile is called gross granularity matching and can be used for low bitrates while using variable xcorr_lag for every tile enables us to match the target and source spectra better.
[0153] In addition, a tile choice stabilization technique is proposed which removes frequency domain artifacts such as trilling and musical noise.
[0154] In case of stereo channel pairs an additional joint stereo processing is applied. This is done, because for a certain destination range the signal can a highly correlated panned sound source. In case the source regions chosen for this particular region are not well correlated, although the energies are matched for the destination regions, the spatial image can suffer due to the uncorrelated source regions. The encoder analyses each destination region energy band, typically performing a cross-correlation of the spectral values and if a certain threshold is exceeded, sets a joint flag for this energy band. In the decoder the left and right channel energy bands are treated individually if this joint stereo flag is not set. In case the joint stereo flag is set, both the energies and the patching are performed in the joint stereo domain. The joint stereo information for the IGF regions is signaled similar the joint stereo information for the core coding, including a flag indicating in case of prediction if the direction of the prediction is from downmix to residual or vice versa.
[0155] The energies can be calculated from the transmitted energies in the UR-domain.
midNrg[k]=leftNrg[k]+rightNrg[k];
sideNrg[k]=leftNrg[k]−rightNrg[k];
with k being the frequency index in the transform domain.
[0156] Another solution is to calculate and transmit the energies directly in the joint stereo domain for bands where joint stereo is active, so no additional energy transformation is needed at the decoder side.
[0157] The source tiles are created according to the Mid/Side-Matrix:
midTile[k]=0.5.Math.(leftTile[k]+rightTile[k])
sideTile[k]=0.5.Math.(leftTile[k]−rightTile[k])
[0158] Energy Adjustment:
midTile[k]−midTile[k]×midNrg[k];
sideTile[k]=sideTile[k]×sideNrg[k];
[0159] Joint Stereo->LR Transformation:
If no additional prediction parameter is coded:
leftTile[k]=midTile[k]+sideTile[k]
rightTile[k]=midTile[k]−sideTile[k]
[0160] If an additional prediction parameter is coded and if the signalled direction is from mid to side:
sideTile[k]=sideTile[k]−predictionCoeff.Math.midTile[k]
leftTile[k]=midTile[k]+sideTile[k]
rightTile[k]=midTile[k]−sideTile[k]
[0161] If the signalled direction is from side to mid:
midTile1[k]midTile[k]−predictionCoeff.Math.sideTile[k]
leftTile[k]midTile1[k]−sideTile[k]
rightTile[k]=midTile1[k]+sideTile[k]
[0162] This processing ensures that from the tiles used for regenerating highly correlated destination regions and panned destination regions, the resulting left and right channels still represent a correlated and panned sound source even if the source regions are not correlated, preserving the stereo image for such regions.
[0163] In other words, in the bitstream, joint stereo flags are transmitted that indicate whether UR or M/S as an example for the general joint stereo coding shall be used. In the decoder, first, the core signal is decoded as indicated by the joint stereo flags for the core bands. Second, the core signal is stored in both L/R and M/S representation. For the IGF tile filling, the source tile representation is chosen to fit the target tile representation as indicated by the joint stereo information for the IGF bands.
[0164] Temporal Noise Shaping (TNS) is a standard technique and part of AAC [11-13]. TNS can be considered as an extension of the basic scheme of a perceptual coder, inserting an optional processing step between the filterbank and the quantization stage. The main task of the TNS module is to hide the produced quantization noise in the temporal masking region of transient like signals and thus it leads to a more efficient coding scheme. First, TNS calculates a set of prediction coefficients using “forward prediction” in the transform domain, e.g. MDCT. These coefficients are then used for flattening the temporal envelope of the signal. As the quantization affects the TNS filtered spectrum, also the quantization noise is temporarily flat. By applying the invers TNS filtering on decoder side, the quantization noise is shaped according to the temporal envelope of the TNS filter and therefore the quantization noise gets masked by the transient.
[0165] IGF is based on an MDCT representation. For efficient coding, advantageously long blocks of approx. 20 ms have to be used. If the signal within such a long block contains transients, audible pre- and post-echoes occur in the IGF spectral bands due to the tile filling.
[0166] This pre-echo effect is reduced by using TNS in the IGF context. Here, TNS is used as a temporal tile shaping (TTS) tool as the spectral regeneration in the decoder is performed on the TNS residual signal. The necessitated TTS prediction coefficients are calculated and applied using the full spectrum on encoder side as usual. The TNS/TTS start and stop frequencies are not affected by the IGF start frequency f.sub.IGFstart of the IGF tool. In comparison to the legacy TNS, the TTS stop frequency is increased to the stop frequency of the IGF tool, which is higher than f.sub.IGFstart. On decoder side the TNS/TTS coefficients are applied on the full spectrum again, i.e. the core spectrum plus the regenerated spectrum plus the tonal components from the tonality map (see
[0167] In legacy decoders, spectral patching on an audio signal corrupts spectral correlation at the patch borders and thereby impairs the temporal envelope of the audio signal by introducing dispersion. Hence, another benefit of performing the IGF tile filling on the residual signal is that, after application of the shaping filter, tile borders are seamlessly correlated, resulting in a more faithful temporal reproduction of the signal.
[0168] In an inventive encoder, the spectrum having undergone TNS/TTS filtering, tonality mask processing and IGF parameter estimation is devoid of any signal above the IGF start frequency except for tonal components. This sparse spectrum is now coded by the core coder using principles of arithmetic coding and predictive coding. These coded components along with the signaling bits form the bitstream of the audio.
[0169]
[0170]
[0171] Advantageously, the high resolution is defined by a line-wise coding of spectral lines such as MDCT lines, while the second resolution or low resolution is defined by, for example, calculating only a single spectral value per scale factor band, where a scale factor band covers several frequency lines. Thus, the second low resolution is, with respect to its spectral resolution, much lower than the first or high resolution defined by the line-wise coding typically applied by the core encoder such as an AAC or USAC core encoder.
[0172] Regarding scale factor or energy calculation, the situation is illustrated in
[0173] Particularly, when the core encoder is under a low bitrate condition, an additional noise-filling operation in the core band, i.e., lower in frequency than the IGF start frequency, i.e., in scale factor bands SCB1 to SCB3 can be applied in addition. In noise-filling, there exist several adjacent spectral lines which have been quantized to zero. On the decoder-side, these quantized to zero spectral values are re-synthesized and the re-synthesized spectral values are adjusted in their magnitude using a noise-filling energy such as NF.sub.2 illustrated at 308 in
[0174] Advantageously, the bands, for which energy information is calculated coincide with the scale factor bands. In other embodiments, an energy information value grouping is applied so that, for example, for scale factor bands 4 and 5, only a single energy information value is transmitted, but even in this embodiment, the borders of the grouped reconstruction bands coincide with borders of the scale factor bands. If different band separations are applied, then certain re-calculations or synchronization calculations may be applied, and this can make sense depending on the certain implementation.
[0175] Advantageously, the spectral domain encoder 106 of
[0176] In the audio encoder of
[0177]
[0178] Then, at the output of block 422, a quantized spectrum is obtained corresponding to what is illustrated in
[0179] The set to zero blocks 410, 418, 422, which are provided alternatively to each other or in parallel are controlled by the spectral analyzer 424. The spectral analyzer may comprise any implementation of a well-known tonality detector or comprises any different kind of detector operative for separating a spectrum into components to be encoded with a high resolution and components to be encoded with a low resolution. Other such algorithms implemented in the spectral analyzer can be a voice activity detector, a noise detector, a speech detector or any other detector deciding, depending on spectral information or associated metadata on the resolution requirements for different spectral portions.
[0180]
[0181] Subsequently, reference is made to
[0182] As illustrated at 301 in
[0183] Advantageously, an IGF operation, i.e., a frequency tile filling operation using spectral values from other portions can be applied in the complete spectrum. Thus, a spectral tile filling operation can not only be applied in the high band above an IGF start frequency but can also be applied in the low band. Furthermore, the noise-filling without frequency tile filling can also be applied not only below the IGF start frequency but also above the IGF start frequency. It has, however, been found that high quality and high efficient audio encoding can be obtained when the noise-filling operation is limited to the frequency range below the IGF start frequency and when the frequency tile filling operation is restricted to the frequency range above the IGF start frequency as illustrated in
[0184] Advantageously, the target tiles (TT) (having frequencies greater than the IGF start frequency) are bound to scale factor band borders of the full rate coder. Source tiles (ST), from which information is taken, i.e., for frequencies lower than the IGF start frequency are not bound by scale factor band borders. The size of the ST should correspond to the size of the associated TT. This is illustrated using the following example. TT[0] has a length of 10 MDCT Bins. This exactly corresponds to the length of two subsequent SCBs (such as 4+6). Then, all possible ST that are to be correlated with TT[0], have a length of 10 bins, too. A second target tile TT[1] being adjacent to TT[0] has a length of 15 bins I (SCB having a length of 7+8). Then, the ST for that have a length of 15 bins rather than 10 bins as for TT[0].
[0185] Should the case arise that one cannot find a TT for an ST with the length of the target tile (when e.g. the length of TT is greater than the available source range), then a correlation is not calculated and the source range is copied a number of times into this TT (the copying is done one after the other so that a frequency line for the lowest frequency of the second copy immediately follows—in frequency—the frequency line for the highest frequency of the first copy), until the target tile TT is completely filled up.
[0186] Subsequently, reference is made to
[0187] Then, the first spectral portion of the reconstruction band such as 307 of
[0188] In this context, it is very important to evaluate the high frequency reconstruction accuracy of the present invention compared to HE-AAC. This is explained with respect to scale factor band 7 in
[0189] In an implementation, the spectral analyzer is also implemented to calculating similarities between first spectral portions and second spectral portions and to determine, based on the calculated similarities, for a second spectral portion in a reconstruction range a first spectral portion matching with the second spectral portion as far as possible. Then, in this variable source range/destination range implementation, the parametric coder will additionally introduce into the second encoded representation a matching information indicating for each destination range a matching source range. On the decoder-side, this information would then be used by a frequency tile generator 522 of
[0190] Furthermore, as illustrated in
[0191] As illustrated, the encoder operates without downsampling and the decoder operates without upsampling. In other words, the spectral domain audio coder is configured to generate a spectral representation having a Nyquist frequency defined by the sampling rate of the originally input audio signal.
[0192] Furthermore, as illustrated in
[0193] As outlined, the spectral domain audio decoder 112 is configured so that a maximum frequency represented by a spectral value in the first decoded representation is equal to a maximum frequency included in the time representation having the sampling rate wherein the spectral value for the maximum frequency in the first set of first spectral portions is zero or different from zero. Anyway, for this maximum frequency in the first set of spectral components a scale factor for the scale factor band exists, which is generated and transmitted irrespective of whether all spectral values in this scale factor band are set to zero or not as discussed in the context of
[0194] The invention is, therefore, advantageous that with respect to other parametric techniques to increase compression efficiency, e.g. noise substitution and noise filling (these techniques are exclusively for efficient representation of noise like local signal content) the invention allows an accurate frequency reproduction of tonal components. To date, no state-of-the-art technique addresses the efficient parametric representation of arbitrary signal content by spectral gap filling without the restriction of a fixed a-priory division in low band (LF) and high band (HF).
[0195] Embodiments of the inventive system improve the state-of-the-art approaches and thereby provides high compression efficiency, no or only a small perceptual annoyance and full audio bandwidth even for low bitrates.
[0196] The general system consists of
full-band core coding
intelligent gap filling (tile filling or noise filling)
sparse tonal parts in core selected by tonal mask
joint stereo pair coding for full-band, including tile filling
TNS on tile
spectral whitening in IGF range
[0197] A first step towards a more efficient system is to remove the need for transforming spectral data into a second transform domain different from the one of the core coder. As the majority of audio codecs, such as AAC for instance, use the MDCT as basic transform, it is useful to perform the BWE in the MDCT domain also. A second requirement for the BWE system would be the need to preserve the tonal grid whereby even HF tonal components are preserved and the quality of the coded audio is thus superior to the existing systems. To take care of both the above mentioned requirements for a BWE scheme, a new system is proposed called Intelligent Gap Filling (IGF).
[0198] Subsequently, further optional features of the full band frequency domain first encoding processor and the full band frequency domain decoding processor incorporating the gap-filling operation, which can be implemented separately or together are discussed and defined.
[0199] Particularly, the spectral domain decoder 112 corresponding to block 1122a is configured to output a sequence of decoded frames of spectral values, a decoded frame being the first decoded representation, wherein the frame comprises spectral values for the first set of spectral portions and zero indications for the second spectral portions. The apparatus for decoding furthermore comprises a combiner 208. The spectral values are generated by a frequency regenerator for the second set of second spectral portions, where both, the combiner and the frequency regenerator are included within block 1122b. Thus, by combining the second spectral portions and the first spectral portions a reconstructed spectral frame comprising spectral values for the first set of the first spectral portions and the second set of spectral portions are obtained and the spectrum-time converter 118 corresponding to the IMDCT block 1124 in
[0200] As outlined, the spectrum-time converter 118 or 1124 is configured to perform an inverse modified discrete cosine transform 512, 514 and further comprises an overlap-add stage 516 for overlapping and adding subsequent time domain frames.
[0201] Particularly, the spectral domain audio decoder 1122a is configured to generate the first decoded representation so that the first decoded representation has a Nyquist frequency defining a sampling rate being equal to a sampling rate of the time representation generated by the spectrum-time converter 1124.
[0202] Furthermore, the decoder 1112 or 1122a is configured to generate the first decoded representation so that a first spectral portion 306 is placed with respect to frequency between two second spectral portions 307a, 307b.
[0203] In a further embodiment, a maximum frequency represented by a spectral value for the maximum frequency in the first decoded representation is equal to a maximum frequency included in the time representation generated by the spectrum-time converter, wherein the spectral value for the maximum frequency in the first representation is zero or different from zero.
[0204] Furthermore, as illustrated in
[0205] Furthermore, the spectral domain audio decoder 112 is configured to generate the first decoded representation having the first spectral portions with the frequency values being greater than the frequency being equal to a frequency in the middle of the frequency range covered by the time representation output by the spectrum-time converter 118 or 1124.
[0206] Furthermore, the spectral analyzer or full-band analyzer 604 is configured to analyze the representation generated by the time-frequency converter 602 for determining a first set of first spectral portions to be encoded with the first high spectral resolution and the different second set of second spectral portions to be encoded with a second spectral resolution which is lower than the first spectral resolution and, by means of the spectral analyzer, a first spectral portion 306 is determined, with respect to frequency, between two second spectral portions in
[0207] Particularly, the spectral analyzer is configured for analyzing the spectral representation up to a maximum analysis frequency being at least one quarter of a sampling frequency of the audio signal.
[0208] Particularly, the spectral domain audio encoder is configured to process a sequence of frames of spectral values for a quantization and entropy coding, wherein, in a frame, spectral values of the second set of second portions are set to zero, or wherein, in the frame, spectral values of the first set of first spectral portions and the second set of the second spectral portions are present and wherein, during subsequent processing, spectral values in the second set of spectral portions are set to zero as exemplarily illustrated at 410, 418, 422.
[0209] The spectral domain audio encoder is configured to generate a spectral representation having a Nyquist frequency defined by the sampling rate of the audio input signal or the first portion of the audio signal processed by the first encoding processor operating in the frequency domain.
[0210] The spectral domain audio encoder 606 is furthermore configured to provide the first encoded representation so that, for a frame of a sampled audio signal, the encoded representation comprises the first set of first spectral portions and the second set of second spectral portions, wherein the spectral values in the second set of spectral portions are encoded as zero or noise values.
[0211] The full band analyzer 604 or 102 is configured to analyze the spectral representation starting with the gap-filing start frequency 209 and ending with a maximum frequency f.sub.max represented by a maximum frequency included in the spectral representation and a spectral portion extending from a minimum frequency up to the gap-filling start frequency 309 belongs to the first set of first spectral portions.
[0212] Particularly, the analyzer is configured to apply a tonal mask processing at least of a portion of the spectral representation so that tonal components and non-tonal components are separated from each other, wherein the first set of the first spectral portions comprises the tonal components and wherein the second set of the second spectral portions comprises the non-tonal components.
[0213] Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
[0214] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
[0215] The inventive transmitted or encoded signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[0216] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
[0217] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
[0218] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
[0219] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
[0220] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
[0221] A further embodiment of the inventive method is, therefore, a data carrier (or a non-transitory storage medium such as a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
[0222] A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
[0223] A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
[0224] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
[0225] A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[0226] In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
[0227] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.