DECODER FOR GENERATING A FREQUENCY ENHANCED AUDIO SIGNAL, METHOD OF DECODING, ENCODER FOR GENERATING AN ENCODED SIGNAL AND METHOD OF ENCODING USING COMPACT SELECTION SIDE INFORMATION
20170358312 · 2017-12-14
Inventors
Cpc classification
International classification
Abstract
A decoder for generating a frequency enhanced audio signal, includes: a feature extractor for extracting a feature from a core signal; a side information extractor for extracting a selection side information associated with the core signal; a parameter generator for generating a parametric representation for estimating a spectral range of the frequency enhanced audio signal not defined by the core signal, wherein the parameter generator is configured to provide a number of parametric representation alternatives in response to the feature, and wherein the parameter generator is configured to select one of the parametric representation alternatives as the parametric representation in response to the selection side information; and a signal estimator for estimating the frequency enhanced audio signal using the parametric representation selected.
Claims
1. A decoder for generating a frequency enhanced audio signal, comprising: a feature extractor configured for extracting a feature from a core signal; a side information extractor configured for extracting a selection side information associated with the core signal; a parameter generator configured for generating a parametric representation for estimating a spectral range of the frequency enhanced audio signal not defined by the core signal, wherein the parameter generator is configured to provide a number of parametric representation alternatives in response to the feature, and wherein the parameter generator is configured to select one of the parametric representation alternatives as the parametric representation in response to the selection side information; a signal estimator configured for estimating the frequency enhanced audio signal using the parametric representation selected; wherein the parameter generator is configured to receive parametric frequency enhancement information associated with the core signal, the parametric frequency enhancement information comprising a group of individual parameters, wherein the parameter generator is configured to provide the selected parametric representation in addition to the parametric frequency enhancement information, wherein the selected parametric representation comprises a parameter not included in the group of individual parameters or a parameter change value for changing a parameter in the group of individual parameters, and wherein the signal estimator is configured for estimating the frequency enhanced audio signal using the selected parametric representation and the parametric frequency enhancement information.
2. The decoder of claim 1, further comprising: an input interface configured for receiving an encoded input signal comprising an encoded core signal and the selection side information; and a core decoder for decoding the encoded core signal to acquire the core signal.
3. The decoder of claim 1, wherein the parameter generator is configured to use, when selecting one of the parametric representation alternatives, a predefined order of the parametric representation alternatives or an encoder-signaled order of the parametric representation alternatives.
4. The decoder of claim 1, wherein the parameter generator is configured to provide an envelope representation as the parametric representation, wherein the selection side information indicates one of a plurality of different sibilants or fricatives, and wherein the parameter generator is configured for providing the envelope representation identified by the selection side information.
5. The decoder of claim 1, in which the signal estimator comprises an interpolator configured for interpolating the core signal, and wherein the feature extractor is configured to extract the feature from the core signal not being interpolated.
6. The decoder of claim 1, wherein the signal estimator comprises: an analysis filter configured for analyzing the core signal or an interpolated core signal to acquire an excitation signal; an excitation extension block configured for generating an enhanced excitation signal comprising the spectral range not comprised by the core signal; and a synthesis filter configured for filtering the extended excitation signal; wherein the analysis filter or the synthesis filter are determined by the parametric representation selected.
7. The decoder of claim 1, wherein the signal estimator comprises a spectral bandwidth extension processor configured for generating an extended spectral band corresponding to the spectral range not comprised by the core signal using at least a spectral band of the core signal and the parametric representation, wherein the parametric representation comprises parameters for at least one of a spectral envelope adjustment, a noise floor addition, an inverse filter and an addition of missing tones, wherein the parameter generator is configured to provide, for a feature, a plurality of parametric representation alternatives, each parametric representation alternative comprising parameters for at least one of a spectral envelope adjustment, a noise floor addition, an inverse filtering, and addition of missing tones.
8. The decoder of claim 1, further comprising: a voice activity detector or a speech/non-speech discriminator, wherein the signal estimator is configured to estimate the frequency enhanced signal using the parametric representation only when the voice activity detector or the speech/non-speech detector indicates a voice activity or a speech signal.
9. The decoder of claim 8, wherein the signal estimator is configured to switch from one frequency enhancement procedure to a different frequency enhancement procedure or to use different parameters extracted from an encoded signal, when the voice activity detector or speech/non-speech detector indicates a non-speech signal or a signal not comprising a voice activity.
10. The decoder of claim 1, wherein the statistical model is configured to provide, in response to a feature, a plurality of alternative of parametric representations, wherein each alternative parametric representation comprises a probability being identical to a probability of a different alternative parametric representation or being different from the probability of the alternative parametric representation by less than 10% of the highest probability.
11. The decoder of claim 1, wherein the selection side information is only comprised by a frame of the encoded signal, when the parameter generator provides a plurality of parametric representation alternatives, and wherein the selection side information is not comprised by a different frame of the encoded audio signal in which the parameter generator provides only a single parametric representation alternative in response to the feature.
12. An encoder for generating an encoded signal, comprising: a core encoder configured for encoding an original signal to acquire an encoded audio signal comprising information on a smaller number of frequency bands compared to an original signal; a selection side information generator configured for generating selection side information indicating a defined parametric representation alternative provided by a statistical model in response to a feature extracted from the original signal or from the encoded audio signal or from a decoded version of the encoded audio signal; and an output interface configured for outputting the encoded signal, the encoded signal comprising the encoded audio signal and the selection side information, wherein the original signal comprises associated meta information describing a sequence of acoustical information for a sequence of samples of the original audio signal, wherein the selection side information generator comprises: a metadata extractor for extracting the sequence of meta information; and a metadata translator for translating the sequence of meta information into a sequence of the selection side information.
13. The encoder of claim 12, wherein the output interface is configured to only comprise the selection side information into the encoded signal, when a plurality of parametric representation alternatives are provided by the statistical model and to not comprise any selection side information into a frame for the encoded audio signal, in which the statistical model is operative to only provide a single parametric representation in response to the feature.
14. A method for generating a frequency enhanced audio signal, comprising: extracting a feature from a core signal; extracting a selection side information associated with the core signal; generating a parametric representation for estimating a spectral range of the frequency enhanced audio signal not defined by the core signal, wherein a number of parametric representation alternatives is provided in response to the feature, and wherein one of the parametric representation alternatives is selected as the parametric representation in response to the selection side information; and estimating the frequency enhanced audio signal using the parametric representation selected, wherein the generating the parametric representation receives parametric frequency enhancement information associated with the core signal, the parametric frequency enhancement information comprising a group of individual parameters, wherein the generating the parametric representation parameter generator provides the selected parametric representation in addition to the parametric frequency enhancement information, wherein the selected parametric representation comprises a parameter not included in the group of individual parameters or a parameter change value for changing a parameter in the group of individual parameters, and wherein the estimating estimates the frequency enhanced audio signal using the selected parametric representation and the parametric frequency enhancement information.
15. A method of generating an encoded signal, comprising: encoding an original signal to acquire an encoded audio signal comprising information on a smaller number of frequency bands compared to an original signal; generating selection side information indicating a defined parametric representation alternative provided by a statistical model in response to a feature extracted from the original signal or from the encoded audio signal or from a decoded version of the encoded audio signal; and outputting the encoded signal, the encoded signal comprising the encoded audio signal and the selection side information, wherein the original signal comprises associated meta information describing a sequence of acoustical information for a sequence of samples of the original audio signal, wherein the generating the selection side information comprises: extracting the sequence of meta information; and translating the sequence of meta information into a sequence of the selection side information.
16. A non-transitory storage medium having stored thereon a computer program for performing, when running on a computer or a processor, the method of claim 14.
17. A non-transitory storage medium having stored thereon a computer program for performing, when running on a computer or a processor, the method of claim 15.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0027] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
DETAILED DESCRIPTION OF THE INVENTION
[0044]
[0045] Furthermore, a side information extractor 110 for extracting a selection side information 114 associated with the core signal 100 is provided. In addition, a parameter generator 108 is connected to the feature extractor 104 via feature transmission line 112 and to the side information extractor 110 via selection side information 114. The parameter generator 108 is configured for generating a parametric representation for estimating a spectral range of the frequency enhanced audio signal not defined by the core signal. The parameter generator 108 is configured to provide a number of parametric representation alternatives in response to the features 112 and to select one of the parametric representation alternatives as the parametric representation in response to the selection side information 114. The decoder furthermore comprises a signal estimator 118 for estimating a frequency enhanced audio signal using the parametric representation selected by the selector, i.e., parametric representation 116.
[0046] Particularly, the feature extractor 104 can be implemented to either extract from the decoded core signal as illustrated in
[0047] Alternatively, however, the feature extractor can also operate or extract a feature from the encoded core signal. Typically, the encoded core signal comprises a representation of scale factors for frequency bands or any other representation of audio information. Depending on the kind of feature extraction, the encoded representation of the audio signal is representative for the decoded core signal and, therefore features can be extracted. Alternatively or additionally, a feature can be extracted not only from a fully decoded core signal but also from a partly decoded core signal. In frequency domain coding, the encoded signal is representing a frequency domain representation comprising a sequence of spectral frames. The encoded core signal can, therefore, be only partly decoded to obtain a decoded representation of a sequence of spectral frames, before actually performing a spectrum-time conversion. Thus, the feature extractor 104 can extract features either from the encoded core signal or a partly decoded core signal or a fully decoded core signal. The feature extractor 104 can be implemented, with respect to its extracted features as known in the art and the feature extractor may, for example, be implemented as in audio fingerprinting or audio ID technologies.
[0048] Advantageously, the selection side information 114 comprises a number N of bits per frame of the core signal.
[0049] Furthermore, the parameter generator is configured to provide, at the most, an amount of parametric representation alternatives being equal to 2.sup.N. On the other hand, when the parameter generator 108 provides, for example, only five parametric representation alternatives, then three bits of selection side information may nevertheless be used.
[0050]
[0051] Furthermore, the parameter generator 108 is configured for retrieving the selection side information 114 from the side information extractor as outlined in step 404. Then, in step 406, a specific parametric representation alternative is selected using the selection side information 114. Finally, in step 408, the selected parametric representation alternative is output to the signal estimator 118.
[0052] Advantageously, the parameter generator 108 is configured to use, when selecting one of the parametric representation alternatives, a predefined order of the parametric representation alternatives or, alternatively, an encoder-signal order of the representation alternatives. To this end, reference is made to
[0053] The predefined order of the parametric representation alternatives can, therefore, be the order in which the statistical model actually delivers the alternatives in response to an extracted feature. Alternatively, if the individual alternative has associated different probabilities which are, however, quite close to each other, then the predefined order could be that the highest probability parametric representation comes first and so on. Alternatively, the order could be signaled for example by a single bit, but in order to even save this bit, a predefined order is advantageous.
[0054] Subsequently, reference is made to
[0055] In an embodiment according to
[0056] Particularly, the selection side information 114 is also termed to be a “fricative information”, since this selection side information distinguishes between problematic sibilants or fricatives such as “f”, “s” or “sh”. Thus, the selection side information provides a clear definition of one of three problematic alternatives which are, for example, provided by the statistical model 904 in the process of the envelope estimation 902 which are both performed in the parameter generator 108. The envelope estimation results in a parametric representation of the spectral envelope of the spectral portions not included in the core signal.
[0057] Block 104 can, therefore, correspond to block 1510 of
[0058] Furthermore, it is advantageous that the signal estimator 118 comprises an analysis filter 910, an excitation extension block 112 and a synthesis filter 940. Thus, blocks 910, 912, 914 may correspond to blocks 1600, 1700 and 1800 of
[0059]
[0060] Thus, other signals different from speech can also be coded as illustrated in
[0061] A further embodiment is illustrated in
[0062]
[0063] Subsequently,
[0064]
[0065] As discussed before,
[0066]
[0067] While
[0068] The selection side information 1210 generated by the selection side information generator 1202 can have any of the characteristics as discussed in the context of the earlier Figures.
[0069] Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
[0070] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
[0071] The inventive transmitted or encoded signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[0072] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
[0073] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
[0074] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
[0075] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
[0076] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
[0077] A further embodiment of the inventive method is, therefore, a data carrier (or a non-transitory storage medium such as a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
[0078] A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
[0079] A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
[0080] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
[0081] A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[0082] In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are advantageously performed by any hardware apparatus.
[0083] While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
[0084] [1] B. Bessette et al., “The Adaptive Multi-rate Wideband Speech Codec (AMR-WB),” IEEE Trans. on Speech and Audio Processing, Vol. 10, No. 8, November 2002. [0085] [2] B. Geiser et al., “Bandwidth Extension for Hierarchical Speech and Audio Coding in ITU-T Rec. G.729.1,” IEEE Trans. on Audio, Speech, and Language Processing, Vol. 15, No. 8, November 2007. [0086] [3] B. Iser, W. Minker, and G. Schmidt, Bandwidth Extension of Speech Signals, Springer Lecture Notes in Electrical Engineering, Vol. 13, New York, 2008. [0087] [4] M. Jelinek and R. Salami, “Wideband Speech Coding Advances in VMR-WB Standard,” IEEE Trans. on Audio, Speech, and Language Processing, Vol. 15, No. 4, [0088] May 2007. [0089] [5] I. Katsir, I. Cohen, and D. Malah, “Speech Bandwidth Extension Based on Speech Phonetic Content and Speaker Vocal Tract Shape Estimation,” in Proc. EUSIPCO 2011, Barcelona, Spain, September 2011. [0090] [6] E. Larsen and R. M. Aarts, Audio Bandwidth Extension: Application of Psychoacoustics, Signal Processing and Loudspeaker Design, Wiley, New York, 2004. [0091] [7] J. Mäkinen et al., “AMR-WB+: A New Audio Coding Standard for 3rd Generation Mobile Audio Services,” in Proc. ICASSP 2005, Philadelphia, USA, Mar. 2005. [0092] [8] M. Neuendorf et al., “MPEG Unified Speech and Audio Coding—The ISO/MPEG Standard for High-Efficiency Audio Coding of All Content Types,” in Proc. 132.sup.nd Convention of the AES, Budapest, Hungary, April 2012. Also to appear in the Journal of the AES, 2013. [0093] [9] H. Pulakka and P. Alku, “Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum,” IEEE Trans. on Audio, Speech, and Language Processing, Vol. 19, No. 7, September 2011. [0094] [10] T. Vaillancourt et al., “ITU-T EV-VBR: A Robust 8-32 kbit/s Scalable Coder for Error Prone Telecommunications Channels,” in Proc. EUSIPCO 2008, Lausanne, Switzerland, August 2008. [0095] [11] L. Miao et al., “G.711.1 Annex D and G.722 Annex B: New ITU-T Superwideband codecs,” in Proc. ICASSP 2011, Prague, Czech Republic, May 2011. [0096] [12] Bernd Geiser, Peter Jax, and Peter Vary: “ROBUST WIDEBAND ENHANCEMENT OF SPEECH BY COMBINED CODING AND ARTIFICIAL BANDWIDTH EXTENSION”, Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC), 2005