Processing of audio signals during high frequency reconstruction
11568880 · 2023-01-31
Assignee
Inventors
Cpc classification
G10L19/0017
PHYSICS
International classification
G10L19/00
PHYSICS
Abstract
The application relates to HFR (High Frequency Reconstruction/Regeneration) of audio signals. In particular, the application relates to a method and system for performing HFR of audio signals having large variations in energy level across the low frequency range which is used to reconstruct the high frequencies of the audio signal. A system configured to generate a plurality of high frequency subband signals covering a high frequency interval from a plurality of low frequency subband signals is described. The system comprises means for receiving the plurality of low frequency subband signals; means for receiving a set of target energies, each target energy covering a different target interval within the high frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval; means for generating the plurality of high frequency subband signals from the plurality of low frequency subband signals and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals, respectively; and means for adjusting the energy of the plurality of high frequency subband signals using the set of target energies.
Claims
1. A system configured to generate a wideband output signal from a narrow band input signal, the system comprising one or more processors adapted to: receive the narrow band input signal; generate, by a quadrature minor filter (QMF) analysis filterbank, a plurality of low frequency subband signals from the narrow band input signal; receive a set of target energies, each target energy covering a different target interval within a high frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval; generate a plurality of high frequency subband signals from the plurality of low frequency subband signals and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals, respectively, by applying the plurality of spectral gain coefficients to the plurality of low frequency subband signals; adjust the energy of the plurality of high frequency subband signals using the set of target energies; combine the low frequency subband signals and the energy-adjusted high frequency subband signals; and generate, by a QMF synthesis filterbank, the wideband output signal from the combined subband signals.
2. A method for generating a wideband output signal from a narrow band input signal, the method comprising: receiving the narrow band input signal; generating, by a quadrature mirror filter (QMF) analysis filterbank, a plurality of low frequency subband signals from the narrow band input signal; receiving a set of target energies, each target energy covering a different target interval within a frequency interval and being indicative of the desired energy of one or more high frequency subband signals lying within the target interval; generating a plurality of high frequency subband signals from the plurality of low frequency subband signals and from a plurality of spectral gain coefficients associated with the plurality of low frequency subband signals, respectively, by applying the plurality of spectral gain coefficients to the plurality of low frequency subband signals; adjusting the energy of the plurality of high frequency subband signals using the set of target energies; combining the low frequency subband signals and the energy-adjusted high frequency subband signals; and generating, by a QMF synthesis filterbank, the wideband output signal from the combined subband signals.
3. A non-transitory storage medium recording a program of instructions that is executable by a device for performing the method of claim 2.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention is explained below by way of illustrative examples with reference to the accompanying drawings, wherein
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
DESCRIPTION OF PREFERRED EMBODIMENTS
(16) The below-described embodiments are merely illustrative for the principles of the present invention PROCESSING OF AUDIO SIGNALS DURING HIGH FREQUENCY RECONSTRUCTION. It is understood that modifications and variations of the arrangements and the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by the scope of the impending patent claims and not by the specific details presented by way of description and explanation of the embodiments herein.
(17) As outlined above, audio decoders using HFR techniques typically comprise an HFR unit for generating a high frequency audio signal and a subsequent spectral envelope adjustment unit for adjusting the spectral envelope of the high frequency audio signal. When adjusting the spectral envelope of the audio signal, this is typically done by means of a filterbank implementation, or by means of time-domain filtering. The adjustment can either strive to do a correction of the absolute spectral envelope, or it can be performed by means of filtering which also corrects phase characteristics. Either way, the adjustment is typically a combination of two steps, the removal of the current spectral envelope, and the application of the target spectral envelope.
(18) It is important to note, that the methods and systems outlined in the present document are not merely directed at the removal of the spectral envelope of the audio signal. The methods and systems strive to do a suitable spectral correction of the spectral envelope of the lowband signal as part of the high frequency regeneration step, in order to not introduce spectral envelope discontinuities of the high frequency spectrum created by combining different segments of the lowband, i.e. of the low frequency signal, shifted or transposed to different frequency ranges of the highband, i.e. of the high frequency signal.
(19) In
(20) In the subsequent envelope adjustment stage, a target spectral envelope is applied onto the high frequency components 105, 115. As can be seen from the spectrum 105, 115 going into the envelope adjuster, discontinuities (notably at the patch borders) can be observed in the spectral shape of the highband excitation signal 105, 115, i.e. of the highband signal entering the envelope adjuster. These discontinuities originate from the fact that several contributions of the low frequencies 101, 111 are used in order to generate the highband 105, 115. As can be seen, the spectral shape of the highband signal 105, 115 is related to the spectral shape of the lowband signal 101, 111. Consequently, particular spectral shapes of the lowband signal 101, 111, e.g. a gradient shape illustrated in
(21) In addition to the spectrum 100, 110,
(22) In
(23)
(24) Furthermore, the envelope adjuster may comprise additional steps and variations, in particular: a limiter functionality, which limits the maximum allowed envelope adjustment value to be applied over a certain frequency band, i.e. over a limiter band 135. The maximum allowed envelope adjustment value is a function of the envelope adjustment values determined for the different scalefactor bands 130 which fall within a limiter band 135. In particular, the maximum allowed envelope adjustment value is a function of the mean of the envelope adjustment values determined for the different scalefactor bands 130 which fall within a limiter band 135. By way of example, the maximum allowed envelope adjustment value may be the mean value of the relevant envelope adjustment values multiplied by a limiter factor (such as 1.5). The limiter functionality is typically applied in order to limit the introduction of noise into the regenerated highband signal 121. This is particularly relevant for audio signals comprising prominent sinusoids, i.e. audio signals having a spectrum with distinct peaks at certain frequencies. Without the use of the limiter functionality, significant envelope adjustment values would be determined for the scalefactor bands 130 for which the original audio signal comprises such distinct peaks. As a result, the spectrum of the complete scalefactor band 130 (and not only the distinct peak) would be adjusted, thereby introducing noise. an interpolation functionality, which allows the envelope adjustment values to be calculated for each individual QMF subband within a scalefactor band, instead of calculating a single envelope adjustment value for the entire scalefactor band. Since the scalefactor bands typically comprise more than one QMF subband, an envelope adjustment value can be calculated as the ratio of the energy of a particular QMF subband within the scalefactor band and the target energy received from the encoder, instead of calculating the ratio of the mean energy of all QMF subbands within the scalefactor band and the target energy received from the encoder. As such, a different envelope adjustment value may be determined for each QMF subband within a scalefactor band. It should be noted that the received target energy value for a scalefactor band typically corresponds to the average energy of that frequency range within the original signal. It is up to the decoder operation how to apply the received average target energy to the corresponding frequency band of the regenerated highband signal. This can be done by applying an overall envelope adjustment value to the QMF subbands within a scalefactor band of the regenerated highband signal or by applying an individual envelope adjustment value to each QMF subband. The latter approach can be thought of as if the received envelope information (i.e. one target energy per scalefactor band) was “interpolated” across the QMF subbands within a scalefactor band in order to provide a higher frequency resolution. Hence, this approach is referred to as “interpolation” in MPEG-4 SBR.
(25) Returning to
(26) Hence, a problem for the re-generation of a highband signal occurs for any signal that has large variations in level over the lowband range. This problem is due to the discontinuities introduced during the high frequency re-generation of the highband. When subsequently the envelope adjuster is exposed to this re-generated signal, it cannot with reasonability and consistence separate the newly introduced discontinuity from any “real-world” spectral characteristic of the lowband signal. The effects of this problem are two-fold. First, spectral shapes are introduced in the highband signal that the envelope adjuster cannot compensate for. Consequently, the output has the wrong spectral shape. Second, an instability effect is perceived, due to the fact that this effect comes and goes as a function of the lowband spectral characteristics.
(27) The present document addresses the above mentioned problem by describing a method and system which provide an HFR highband signal at the input of the envelope adjuster which does not exhibit spectral discontinuities. For this purpose, it is proposed to remove or reduce the spectral envelope of the lowband signal when performing high frequency regeneration. By doing this, one will avoid to introduce any spectral discontinuities into the highband signal prior to performing envelope adjustment. As a result, the envelope adjuster will not have to handle such spectral discontinuities. In particular, a conventional envelope adjuster may be used, wherein the limiter functionality of the envelope adjuster is used to avoid the introduction of noise into the regenerated highband signal. In other words, the described method and system may be used to re-generate an HFR highband signal having little or no spectral discontinuities and a low level of noise.
(28) It should be noted that the time-resolution of the envelope adjuster may be different from the time resolution of the proposed processing of the spectral envelope during the highband signal generation. As indicated above, the processing of the spectral envelope during the highband signal re-generation is intended to modify the spectral envelope of the lowband signal, in order to alleviate the processing within the subsequent envelope adjuster. This processing, i.e. the modification of the spectral envelope of the lowband signal, may be performed e.g. once per audio frame, wherein the envelope adjuster may adjust the spectral envelope over several time intervals, i.e. using several received spectral envelopes. This is outlined in
(29) In
(30) As outlined in
(31) The modification of the spectral envelope of the lowband signal can be achieved by applying a gain curve to the spectral envelope of the lowband signal. Such a gain curve can be determined by a gain curve determination unit 400 illustrated in
(32) The optional control data 404 may comprise information on the resolution of the coarse spectral envelope which is to be estimated in the module 400, and/or information on the suitability of applying the gain-adjustment process. As such, the control data 404 may control the amount of additional processing involved during the gain-adjustment process. The control data 404 may also trigger a by-pass of the additional gain adjustment processing, if signals occur that do not lend themselves well to coarse spectral envelope estimation, e.g. signals comprising single sinusoids.
(33) In
(34) The method used for determining the coarse spectral envelope from the high resolution spectral envelope and in particular the order of the polynomial which is fitted to the high resolution spectral envelope can be controlled by the optional control data 404. The order of the polynomial may be a function of the size of the frequency range 302 of the lowband signal for which a coarse spectral envelope 301 is to be determined, and/or it may be a function of other parameters relevant for the overall coarse spectral shape of the relevant frequency range 302 of the lowband signal. The polynomial fitting calculates a polynomial that approximates the data in a least square error sense. In the following, a preferred embodiment is outlined, by means of Matlab code:
(35) TABLE-US-00001 function GainVec = calculateGainVec (LowEnv) %% function GainVec = calculateGainVec (LowEnv) % Input: Lowband envelope energy in dB % Output: gain vector to be applied to the lowband prior to HF- % generation % % The function does a low order polynomial fitting of the low band % spectral envelope, as a representation of the lowband overall % spectral slope. The overall slope according to this is subsequently % translated into a gain vector that can be applied prior to HF- % generation to remove the overall slope (or coarse spectral shape). % % This prevents that the HF generation introduces discontinuities in % the spectral shape, that will be “confusing” for the subsequent % envelope adjustment and limiter-process. The “confusion” occurs when % the envelope adjuster and limiter needs to take care of a large dis- % continuity, and thus a large gain value. It is very difficult to % tune and have a proper operation of these modules if they are to % take care of both “natural” variations in the highband as well as % the “artificial” variations introduced by the HF generation process. polyOrderWhite = 3; x_lowBand = 1:length (LowEnv); p=polyfit (x_lowBand, LowEnv, polyOrderWhite); lowBandEnvSlope = zeros (size (x_lowBand) ); for k=polyOrderWhite:−1:0 tmp = (x_lowBand. {circumflex over ( )}k) . *p (polyOrderWhite − k + 1); lowBandEnvSlope = lowBandEnvSlope + tmp; end GainVec = 10. {circumflex over ( )} ( (mean (LowEnv) − lowBandEnvSlope) ./20);
(36) In the above code, the input is the spectral envelope (LowEnv) of the lowband signal obtained by averaging QMF subband samples on a per subband basis over a time-interval corresponding to the current time frame of data operated on by the subsequent envelope adjuster. As indicated above, the gain-adjustment processing of the lowband signal may be performed on various other time-grids. In the above example, the estimated absolute spectral envelope is expressed in a logarithmic domain. A polynomial of low order, in the above example a polynomial of order 3, is fitted to the data. Given the polynomial, a gain curve (GainVec) is calculated from the difference in mean energy of the lowband signal and the curve (lowBandEnvSlope)) obtained from the polynomial fitted to the data. In the above example, the operation of determining the gain curve is done in the logarithmic domain.
(37) The gain curve calculation is performed by the gain curve calculation unit 503. As indicated above, the gain curve may be determined from the mean energy of the part of the lowband signal used to re-generate the highband signal, and from the spectral envelope of the part of the lowband signal used to re-generate the highband signal. In particular, the gain curve may be determined from the difference of the mean energy and the coarse spectral envelope, represented e.g. by a polynomial. I.e. the calculated polynomial may be used to determine a gain curve which comprises a separate gain value, also referred to as a spectral gain coefficient, for every relevant QMF subband of the lowband signal. This gain curve comprising the gain values is subsequently used in the HFR process.
(38) As an example, an HFR generation process in accordance to MPEG-4 SBR is described next. The HF generated signal may be derived by the following formula (see document MPEG-4 Part 3 (ISO/IEC 14496-3), sub-part 4, section 4.6.18.6.2, which is incorporated by reference):
X.sub.High(k,l+t.sub.HFAdj)=X.sub.Low(p,l+t.sub.HFAdj)+bwArray(g(k).Math.α.sub.0(p).Math.X.sub.Low(p,l−1+t.sub.HFAdj)+[bwArray(g(k))].sup.2.Math.α.sub.1(p).Math.X.sub.Low(p,l−2+t.sub.HFAdj),
wherein p is the subband index of the lowband signal, i.e. p identifies one of the plurality of low frequency subband signals. The above HF generation formula may be replaced by the following formula which performs a combined gain adjustment and HF generation:
(39)
wherein the gain curve is referred to as preGain(p).
(40) Further details of the copy-up process, e.g. with regards to the relation between p and k, are specified in the above mentioned MPEG-4, Part 3 document. In the above formula, X.sub.Low(p,l) indicates a sample at time instance l of the low frequency subband signal having a subband index p. This sample in combination with preceding samples is used to generate a sample of the high frequency subband signal X.sub.High(k,l) having a subband index k.
(41) It should be noted that the aspect of gain adjustment can be used in any filterbank based high frequency reconstruction system. This is illustrated in
(42) As already indicated above, it may be beneficial to signal the activation of the gain adjustment processing in the bitstream from an encoder to a decoder. For certain signal types, e.g. a single sinusoid, the gain adjustment processing may not be relevant and it may therefore be beneficial to enable the encoder/decoder system to turn the additional processing off in order to not introduce an unwanted behaviour for such corner case signals. For this purpose, the encoder may be configured to analyze the audio signals and to generate control data which turns on and off the gain adjustment processing at the decoder.
(43) In
(44) In
(45) In
(46) In
(47) The complexity of the proposed gain adjustment algorithm was calculated as weighted MOPS, where functions like POW/DIV/TRIG are weighted as 25 operations, and all other operations are weighted as one operation. Given these assumptions, the calculated complexity amounts to approximately 0.1 WMOPS and insignificant RAM/ROM usage. In other words, the proposed gain adjustment processing requires low processing and memory capacity.
(48) In the present document, a method and system for generating a highband signal from a lowband signal have been described. The method and system are adapted to generate a highband signal with little or no spectral discontinuities, thereby improving the perceptual performance of high frequency reconstruction methods and systems. The method and system can be easily incorporated into existing audio encoding/decoding systems. In particular, the method and system can be incorporated without the need to modify the envelope adjustment processing of existing audio encoding/decoding systems. Notably this applies to the limiter and interpolation functionality of the envelope adjustment processing which can perform their intended tasks. As such, the described method and system may be used to re-generate highband signals having little or no spectral discontinuities and a low level of noise. Furthermore, the use of control data has been described, wherein the control data may be used to adapt the parameters of the described method and system (and the computational complexity) to the type of audio signal.
(49) The methods and systems described in the present document may be implemented as software, firmware and/or hardware. Certain components may e.g. be implemented as software running on a digital signal processor or microprocessor. Other components may e.g. be implemented as hardware and or as application specific integrated circuits. The signals encountered in the described methods and systems may be stored on media such as random access memory or optical storage media. They may be transferred via networks, such as radio networks, satellite networks, wireless networks or wireline networks, e.g. the internet. Typical devices making use of the methods and systems described in the present document are portable electronic devices or other consumer equipment which are used to store and/or render audio signals. The methods and systems may also be used on computer systems, e.g. internet web servers, which store and provide audio signals, e.g. music signals, for download.