NOISE FILLING WITHOUT SIDE INFORMATION FOR CELP-LIKE CODERS
20210074307 ยท 2021-03-11
Inventors
- Guillaume Fuchs (Erlangen, DE)
- Christian Helmrich (Erlangen, DE)
- Manuel Jander (Erlangen, DE)
- Benjamin SCHUBERT (Nuernberg, DE)
- Yoshikazu Yokotani (Erlangen, DE)
Cpc classification
G10L19/12
PHYSICS
G10L19/087
PHYSICS
International classification
G10L19/087
PHYSICS
G10L19/028
PHYSICS
Abstract
An audio decoder provides a decoded audio information on the basis of an encoded audio information including linear prediction coefficients (LPC) and includes a tilt adjuster to adjust a tilt of a noise using linear prediction coefficients of a current frame to acquire a tilt information and a noise inserter configured to add the noise to the current frame in dependence on the tilt information. Another audio decoder includes a noise level estimator to estimate a noise level for a current frame using a linear prediction coefficient of at least one previous frame to acquire a noise level information; and a noise inserter to add a noise to the current frame in dependence on the noise level information provided by the noise level estimator. Thus, side information about a background noise in the bit-stream may be omitted. Methods and computer programs serve a similar purpose.
Claims
1. An audio decoder for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the audio decoder comprising: a tilt adjuster configured to adjust a tilt of a background noise, wherein the tilt adjuster is configured to use linear prediction coefficients of a current frame to acquire a tilt information; and a decoder core configured to decode an audio information of the current frame using the linear prediction coefficients of the current frame to acquire a decoded core coder output signal; and a noise inserter configured to add the adjusted background noise to the current frame, to perform a noise filling; wherein the tilt adjuster is configured to obtain the tilt information using a calculation of a gain g of the linear prediction coefficients of the current frame,
wherein g=(ak.Math.ak+1)/(ak.Math.ak), wherein a.sub.k is a linear prediction coefficient of the current frame, located at LPC index k.
2. The audio decoder according to claim 1, wherein the audio decoder comprises a frame type determinator for determining a frame type of the current frame, the frame type determinator being configured to activate the tilt adjuster to adjust the tilt of the background noise when the frame type of the current frame is detected to be of a speech type.
3. The audio decoder according to claim 1, wherein the tilt adjuster is configured to use a result of a first-order analysis of the linear prediction coefficients of the current frame to acquire the tilt information.
4. The audio decoder according to claim 3, wherein the tilt adjuster is configured to acquire the tilt information using a calculation of a gain g of the linear prediction coefficients of the current frame as the first-order analysis.
5. The audio decoder according to claim 1, wherein the audio decoder furthermore comprises: a noise level estimator configured to estimate a noise level for a current frame using a plurality of linear prediction coefficient of at least one previous frame to acquire a noise level information; wherein the noise inserter configured to add the background noise to the current frame in dependence on the noise level information provided by the noise level estimator; wherein the audio decoder is adapted to decode an excitation signal of the current frame and to compute its root mean square e.sub.rms; wherein the audio decoder is adapted to compute a peak level p of a transfer function of an LPC filter of the current frame; wherein the audio decoder is adapted to compute a spectral minimum m.sub.f of the current audio frame by computing the quotient of the root mean square e.sub.rms and the peak level p to acquire the noise level information; wherein the noise level estimator is adapted to estimate the noise level on the basis of two or more quotients of different audio frames.
6. An audio decoder for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the audio decoder comprising: a noise inserter configured to add a noise to the current frame in dependence on a noise level information; wherein the audio decoder is adapted to decode an excitation signal of the current frame and to compute its root mean square e.sub.rms; wherein the audio decoder is adapted to compute a peak level p of a transfer function of an LPC filter of the current frame; wherein the audio decoder is adapted to compute a spectral minimum m.sub.f of the current audio frame by computing the quotient of the root mean square e.sub.rms and the peak level p to acquire the noise level information; wherein the noise level estimator is adapted to estimate the noise level on the basis of two or more quotients of different audio frames; wherein the audio decoder comprises a decoder core configured to decode an audio information of the current frame using linear prediction coefficients of the current frame to acquire a decoded core coder output signal and wherein the noise inserter adds the noise depending on linear prediction coefficients used in decoding the audio information of the current frame and used in decoding the audio information of one or more previous frames.
7. The audio decoder according to claim 6, wherein the audio decoder comprises a frame type determinator for determining a frame type of the current frame, the frame type determinator being configured to identify whether the frame type of the current frame is speech or general audio, so that the noise level estimation can be performed depending on the frame type of the current frame.
8. The audio decoder according to claim 6, wherein the audio decoder is adapted to compute the root mean square e.sub.rms of the current frame from the time domain representation of the current frame to acquire the noise level information under the condition that the current frame is of a speech type.
9. The audio decoder according to claim 6, wherein the audio decoder is adapted to decode an unshaped MDCT-excitation of the current frame and to compute its root mean square e.sub.rms from the spectral domain representation of the current frame to acquire the noise level information if the current frame is of a general audio type.
10. The audio decoder according to claim 6, wherein the audio decoder is adapted to enqueue the quotient acquired from the current audio frame in the noise level estimator regardless of the frame type, the noise level estimator comprising a noise level storage for two or more quotients acquired from different audio frames.
11. The audio decoder according to claim 6, wherein the noise level estimator is adapted to estimate the noise level on the basis of statistical analysis of two or more quotients of different audio frames.
12. The audio decoder according to claim 1, wherein the audio decoder comprises a de-emphasis filter to de-emphasize the current frame, the audio decoder being adapted to applying the de-emphasis filter on the current frame after the noise inserter added the noise to the current frame.
13. The audio decoder according to claim 1, wherein the audio decoder comprises a noise generator, the noise generator being adapted to generate the noise to be added to the current frame by the noise inserter.
14. The audio decoder according to claim 1, wherein the audio decoder comprises a noise generator configured to generate random white noise.
15. The audio decoder according to claim 1, wherein the audio decoder is configured to use a decoder based on one or more of the decoders AMR-WB, G.718 or LD-USAC (EVS) in order to decode the encoded audio information.
16. A method for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the method comprising: adjusting a tilt of a background noise, wherein linear prediction coefficients of a current frame are used to acquire a tilt information; and decoding an audio information of the current frame using the linear prediction coefficients of the current frame to acquire a decoded core coder output signal; and adding the adjusted background noise to the current frame, to perform a noise filling; wherein the tilt information is obtained using a calculation of a gain g of the linear prediction coefficients of the current frame,
wherein g=(ak.Math.ak+1)/(ak.Math.ak), wherein ak is a linear prediction coefficient of the current frame, located at LPC index k.
17. A method for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the method comprising: adding a noise to the current frame in dependence on a noise level information; wherein an excitation signal of the current frame is decoded and wherein its root mean square e.sub.rms is computed; wherein a peak level p of a transfer function of an LPC filter of the current frame is computed; wherein a spectral minimum m.sub.f of the current audio frame is computed by computing the quotient of the root mean square e.sub.rms and the peak level p to acquire the noise level information; wherein the noise level is estimated on the basis of two or more quotients of different audio frames; wherein the method comprises decoding an audio information of the current frame using linear prediction coefficients of the current frame to acquire a decoded core coder output signal and wherein the method comprises adding the noise depending on linear prediction coefficients used in decoding the audio information of the current frame and used in decoding the audio information of one or more previous frames.
18. An audio decoder for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the audio decoder comprising: a tilt adjuster configured to adjust a tilt of a background noise, wherein the tilt adjuster is configured to use linear prediction coefficients of a current frame to acquire a tilt information; a decoder core configured to decode an audio information of the current frame using the linear prediction coefficients of the current frame to acquire a decoded core coder output signal; and a noise inserter configured to add the adjusted background noise to the current frame, to perform a noise filling, wherein the noise filling is used to fill spectral gaps or valleys; wherein the tilt adjuster is configured to obtain the tilt information using a calculation of a gain g of the linear prediction coefficients of the current frame,
wherein g=(ak.Math.ak+1)/(ak.Math.ak), wherein ak is a linear prediction coefficient of the current frame, located at LPC index k.
19. An audio decoder for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the audio decoder comprising: a tilt adjuster configured to adjust a tilt of a background noise, wherein the tilt adjuster is configured to use linear prediction coefficients of a current frame to acquire a tilt information; a decoder core configured to decode an audio information of the current frame using the linear prediction coefficients of the current frame to acquire a decoded core coder output signal; and a noise inserter configured to add the adjusted background noise to the current frame, to perform a noise filling, wherein noise is added in a frequency region of the decoded core coder output signal provided by the decoder core; wherein the tilt adjuster is configured to obtain the tilt information using a calculation of a gain g of the linear prediction coefficients of the current frame,
wherein g=(ak.Math.ak+1)/(ak.Math.ak), wherein ak is a linear prediction coefficient of the current frame, located at LPC index k.
20. The audio encoder of claim 19, wherein the audio decoder comprises a noise filling configured to fill spectral gaps or valleys in a decoded spectrum, wherein the audio decoder comprises a tilt determination configured to determine a tilt of a noise filling noise for the noise filling, and wherein the tilt determination is configured to use linear prediction coefficients of a current frame to acquire a tilt information.
21. A method for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the method comprising: adjusting a tilt of a background noise, wherein linear prediction coefficients of a current frame are used to acquire a tilt information; and adding the adjusted background noise to the current frame, to perform a noise filling; wherein the tilt information is obtained using a calculation of a gain g of the linear prediction coefficients of the current frame,
wherein g=(ak.Math.ak+1)/(ak.Math.ak), wherein ak is a linear prediction coefficient of the current frame, located at LPC index k.
22. An audio decoder for providing a decoded audio information on the basis of an encoded audio information comprising linear prediction coefficients (LPC), the audio decoder comprising: a tilt adjuster configured to adjust a tilt of a background noise, wherein the tilt adjuster is configured to use linear prediction coefficients of a current frame to acquire a tilt information; a noise inserter configured to add the adjusted background noise to the current frame, to perform a noise filling, wherein the tilt information is obtained using a calculation of a gain g of the linear prediction coefficients of the current frame,
wherein g=(ak.Math.ak+1)/(ak.Math.ak), wherein ak is a linear prediction coefficient of the current frame, located at LPC index k.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0046] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
DETAILED DESCRIPTION OF THE INVENTION
[0056] The invention is described in detail with regards to the
[0057]
[0058]
[0059]
[0060]
[0061] In addition, the audio decoder according to
[0062]
[0063] In other words, according to
[0065] MDCT or DTX frame. Regardless of the frame type, the spectrally flattened excitation signal (in perceptual domain) is decoded and used to update the noise level estimate as described below in detail. Then the signal is fully reconstructed up to the de-emphasis, which is the last step.
[0066] 2. If the frame is ACELP-coded, the tilt (overall spectral shape) for the noise insertion is computed by first-order LPC analysis of the LPC filter coefficients. The tilt is derived from the gain g of the 16 LPC coefficients a.sub.k, which is given by g=[a.sub.k.Math.a.sub.k+1]/[a.sub.k.Math.a.sub.k]. [0067] 3. If the frame is ACELP-coded, the noise shaping level and tilt are employed to perform the noise addition onto the decoded frame: a random noise generator generates the spectrally white noise signal, which is then scaled and shaped using the g-derived tilt. [0068] 4. The shaped and leveled noise signal for the ACELP frame is added onto the decoded signal just before the final de-emphasis filtering step. Since the de-emphasis is a first order IIR boosting low frequencies, this allows for low-complexity, steep IIR high-pass filtering of the added noise, as in
[0069] The noise level estimation in step 1 is performed by computing the root mean square e.sub.rms of the excitation signal for the current frame (or in case of an MDCT-domain excitation the time domain equivalent, meaning the e.sub.rms which would be computed for that frame if it were an ACELP frame) and by then dividing it by the peak level p of the transfer function of the LPC analysis filter. This yields the level m.sub.f of the spectral minimum of frame f as in
[0070] Although some aspects have been described in the context of an audio decoder, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding audio decoder. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
[0071] The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[0072] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
[0073] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
[0074] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
[0075] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
[0076] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
[0077] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitionary.
[0078] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
[0079] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
[0080] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
[0081] A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[0082] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
[0083] The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
[0084] The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
[0085] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
LIST OF CITED NON-PATENT LITERATURE
[0086] [1] B. Bessette et al., The Adaptive Multi-rate Wideband Speech Codec (AMR-WB), IEEE Trans. On Speech and Audio Processing, Vol. 10, No. 8, November 2002. [0087] [2] R. C. Hendriks, R. Heusdens and J. Jensen, MMSE based noise PSD tracking with low complexity, in IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 4266-4269, March 2010. [0088] [3] R. Martin, Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics, IEEE Trans. On Speech and Audio Processing, Vol. 9, No. 5, July 2001. [0089] [4] M. Jelinek and R. Salami, Wideband Speech Coding Advances in VMR-WB Standard, IEEE Trans. On Audio, Speech, and Language Processing, Vol. 15, No. 4, May 2007. [0090] [5] J. Mkinen et al., AMR-WB+: A New Audio Coding Standard for 3.sup.rd Generation Mobile Audio Services, in Proc. ICASSP 2005, Philadelphia, USA, March 2005. [0091] [6] M. Neuendorf et al., MPEG Unified Speech and Audio CodingThe ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types, in Proc. 132.sup.nd AES Convention, Budapest, Hungary, April 2012. Also appears in the Journal of the AES, 2013. [0092] [7] T. Vaillancourt et al., ITU-T EV-VBR: A Robust 8-32 kbit/s Scalable Coder for Error Prone Telecommunications Channels, in Proc. EUSIPCO 2008, Lausanne, Switzerland, August 2008.