ENCODER FOR ENCODING AN AUDIO SIGNAL, AUDIO TRANSMISSION SYSTEM AND METHOD FOR DETERMINING CORRECTION VALUES
20170309284 · 2017-10-26
Inventors
- Konstantin Schmidt (Nuernberg, DE)
- Guillaume Fuchs (Bubenreuth, DE)
- Matthias Neusinger (Rohr, DE)
- Martin DIETZ (Nuernberg, DE)
Cpc classification
G10L19/06
PHYSICS
International classification
Abstract
An encoder for encoding an audio signal includes an analyzer for analyzing the audio signal and for determining analysis prediction coefficients from the audio signal. The encoder includes a converter for deriving converted prediction coefficients from the analysis prediction coefficients, a memory for storing a multitude of correction values and a calculator. The calculator includes a processor for processing the converted prediction coefficients to obtain spectral weighting factors. The calculator includes a combiner for combining the spectral weighting factors and the multitude of correction values to obtain corrected weighting factors. A quantizer of the calculator is configured for quantizing the converted prediction coefficients using the corrected weighting factors to obtain a quantized representation of the converted prediction coefficients. The encoder includes a bitstream former for forming an output signal based on the quantized representation of the converted prediction coefficients and based on the audio signal.
Claims
1. Method for determining correction values for a first multitude of first weighting factors each weighting factor adapted for weighting a portion of an audio signal, the method comprising: calculating the first multitude of first weighting factors for each audio signal of a set of audio signals and based on a first determination rule; calculating a second multitude of second weighting factors for each audio signal of the set of audio signals based on a second determination rule, each of the second multitude of weighting factors being related to a first weighting factor; calculating a third multitude of distance values each distance value having a value related to a distance between a first weighting factor and a second weighting factor related to a portion of the audio signal; and calculating a fourth multitude of correction values adapted to reduce the distance values when combined with the first weighting factors; wherein the fourth multitude of correction values is determined based on a polynomial fitting comprising multiplying the values of the first weighting factors with a polynomial (y=a+bx+cx.sup.2) comprising at least one variable for adapting a term of the polynomial.
2. Method according to claim 1, wherein the fourth multitude of correction values is determined based on a polynomial fitting comprising: multiplying the values of the first weighting factors with a polynomial (y=a+bx+cx.sup.2) comprising at least one variable for adapting a term of the polynomial; calculating a value for the variable such that the third multitude of distance values comprises a value below a threshold value based on:
3. Method according to claim 1, wherein the third multitude of distance values is calculated based on a further information comprising reflection coefficients or an information related to a power spectrum of the at least one of the set of audio signals based on:
4. A non-transitory digital storage medium having a computer program stored thereon to perform the method according to claim 1 when said computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0032] Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
DETAILED DESCRIPTION OF THE INVENTION
[0044] Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.
[0045] In the following description, a plurality of details is set forth to provide a more thorough explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described hereinafter may be combined with each other, unless specifically noted otherwise.
[0046]
[0047] The encoder 100 comprises a converter 120 configured for deriving converted prediction coefficients 122 from the prediction coefficients 112. The converter 120 may be configured for determining the converted prediction coefficients 122 to obtain, for example, Line Spectral Frequencies (LSF) and/or Immittance Spectral Frequencies (ISF). The converted prediction coefficients 122 may comprise a higher robustness with respect to quantization errors in a later quantization when compared to the prediction coefficients 112. As quantization is usually performed non-linearly, quantizing linear prediction coefficients may lead to distortions of a decoded audio signal.
[0048] The encoder 100 comprises a calculator 130. The calculator 130 comprises a processor 140 which is configured to process the converted prediction coefficients 122 to obtain spectral weighting factors 142. The processor may be configured to calculate and/or to determine the weighting factors 142 based on one or more of a plurality of known determination rules such as an inverse harmonic mean (IHM) as it is known from [1] or according to a more complex approach as it is described in [2]. The International Telecommunication Union (ITU) Standard G.718 describes a further approach of determining weighting factors by expanding the approach of [2] as it is described in [3]. The processor 140 is configured to determine the weighting factors 142 based on a determination rule comprising a low computational complexity. This may allow for a high throughput of encoded audio signals and/or a simple realization of the encoder 100 due to hardware that may consume less energy based on less computational efforts.
[0049] The calculator 130 comprises a combiner 150 configured for combining the spectral weighting factors 142 and a multitude of correction values 162 to obtain corrected weighting factors 152. The multitude of correction values is provided from a memory 160 in which the correction values 162 are stored. The correction values 162 may be static or dynamic, i.e. the correction values 162 may be updated during operation of the encoder 100 or may remain unchanged during operation and/or may be only updated during a calibration procedure for calibrating the encoder 100. The memory 160 comprises static correction values 162. The correction values 162 may be obtained, for example, by a precalculation procedure as it is described later on. Alternatively, the memory 160 may alternatively be comprised by the calculator 130 as it is indicated by the dotted lines.
[0050] The calculator 130 comprises a quantizer 170 configured for quantizing the converted prediction coefficients 122 using the corrected weighting factors 152. The quantizer 170 is configured to output a quantized representation 172 of the converted prediction coefficients 122. The quantizer 170 may be a linear quantizer, a non-linear quantizer such as a logarithmic quantizer or a vector-like quantizer, a vector quantizer respectively. A vector-like quantizer may be configured to quantize a plurality pf portions of the corrected weighting factors 152 to a plurality of quantized values (portions). The quantizer 170 may be configured for weighting the converted prediction coefficients 122 with the corrected weighting factors 152. The quantizer may further be configured for determining a distance of the weighted converted prediction coefficients 122 to entries of a database of the quantizer 170 and to select a code word (representation) that is related to an entry in the database wherein the entry may comprise a lowest distance to the weighted converted prediction coefficients 122. Such a procedure is exemplarily described later on. The quantizer 170 may be a stochastic
[0051] Vector Quantizer (VQ). Alternatively, the quantizer 170 may also be configured for applying other Vector Quantizers like Lattice VQ or any scaler quantizer. Alternatively, the quantizer 170 may also be configured to apply a linear or logarithmic quantization.
[0052] The quantized representation 172 of the converted prediction coefficients 122, i.e. the code word, is provided to a bitstream former 180 of the encoder 100. The encoder 100 may comprise an audio processing unit 190 configured for processing some or all of the audio information of the audio signal 102 and/or further information. Audio processing unit 190 is configured for providing audio data 192 such as a voiced signal information or an unvoiced signal information to the bitstream former 180. The bitstream former 180 is configured for forming an output signal (bitstream) 182 based on the quantized representation 172 of the converted prediction coefficients 122 and based on the audio information 192, which is based on the audio signal 102.
[0053] An advantage of the encoder 100 is that the processor 140 may be configured to obtain, i.e. to calculate, the weighting factors 142 by using a determination rule that comprises a low computational complexity. The correction values 162 may be obtained by, when expressed in a simplified manner, comparing a set of weighting factors obtained by a (reference) determination rule with a high computational complexity but therefore comprising a high precision and/or a good audio quality and/or a low LSD with weighting factors obtained by the determination rule executed by the processor 140. This may be done for a multitude of audio signals, wherein for each of the audio signals a number of weighting factors is obtained based on both determination rules. For each audio signal, the obtained results may be compared to obtain an information related to a mismatch or an error. The information related to the mismatch or the error may be summed up and/or averaged with respect to the multitude of audio signals to obtain an information related to an average error that is made by the processor 140 with respect to the reference determination rule when executing the determination rule with the lower computational complexity. The obtained information related to the average error and/or mismatch may be represented in the correction values 162 such that the weighting factors 142 may be combined with the correction values 162 by the combiner to reduce or compensate the average error. This allows for reducing or almost compensating the error of the weighting factors 142 when compared to the reference determination rule used offline while still allowing for a less complex determination of the weighting factors 142.
[0054]
[0055] The calculator 130′ further comprises a smoother 155 configured for receiving corrected weighting factors 152′ from the combiner 150′ and an optional information 157 (control flag) allowing for controlling operation (ON-/OFF-state) of the smoother 155. The control flag 157 may be obtained, for example, from the analyzer indicating that smoothing is to be performed in order to reduce harsh transitions. The smoother 155 is configured for combining corrected weighting factors 152′ and corrected weighting factors 152′″ which are a delayed representation of corrected weighting factors determined for a previous frame or sub-frame of the audio signal, i.e. corrected weighting factors determined in a previous cycle in the ON-state. The smoother 155 may be implemented as an infinite impulse response (IIR) filter. Therefore, the calculator 130′ comprises a delay block 159 configured for receiving and delaying corrected weighting factors 152″ provided by the smoother 155 in a first cycle and to provide those weights as the corrected weighting factors 152′″ in a following cycle.
[0056] The delay block 159 may be implemented, for example, as a delay filter or as a memory configured for storing the received corrected weighting factors 152″. The smoother 155 is configured for weightedly combining the received corrected weighting factors 152′ and the received corrected weighting factors 152′″ from the past. For example, the (present) corrected weighting factors 152′ may comprise a share of 25%, 50%, 75% or any other value in the smoothed corrected weighting factors 152″, wherein the (past) weighting factors 152′″ may comprise a share of (1-share of corrected weighting factors 152′). This allows for avoiding harsh transitions between subsequent audio frames when the audio signal, i.e. two subsequent frames thereof, result in different corrected weighting factors which would lead to distortions in a decoded audio signal. In the OFF-state, the smoother 155 is configured for forwarding the corrected weighting factors 152′. Alternatively or in addition, smoothing may allow for an increased audio quality for audio signals comprising a high level of periodicity.
[0057] Alternatively, the smoother 155 may be configured to additionally combine corrected weighted factors of more previous cycles. Alternatively or in addition, the converted prediction coefficients 122′ may also be the Immittance Spectral Frequencies.
[0058] A weighting factor w.sub.i may be obtained, for example, based on the inverse harmonic mean (IHM). A determination rule may be based on a form:
wherein w.sub.i denotes a determined weight 142′ with index i, LSF.sub.i denotes a line spectral frequency with index i. The index i corresponds to a number of spectral weighting factors obtained and may be equal to a number of prediction coefficients determined by the analyzer.
[0059] The number of prediction coefficients and therefore the number of converted coefficients may be, for example, 16. Alternatively, the number may also be 8 or 32. Alternatively, the number of converted coefficients may also be lower than the number of prediction coefficients, for example, if the converted coefficients 122 are determined as Immittance Spectral Frequencies which may comprise a lower number when compared to the number of prediction coefficients.
[0060] In other words,
[0061] In the following, reference will be made to details of correcting the derived weighting factors. For example, the analyzer is configured to determine linear prediction coefficients (LPC) of order 10 or 16, i.e. a number of 10 or 16 LPC. Although the analyzer may also be configured to determine any other number of linear prediction coefficients or a different type of coefficient, the following description is made with reference to 16 coefficients, as this number of coefficients is used in mobile communication.
[0062]
[0063] The spectral processor 145 comprises an energy calculator 145a which is configured to compute an amount or a measure 146 for an energy of frequency bins of the spectrum of the audio signal 102 based on the spectral parameters 116. The spectral processor further comprises a normalizer 145b for normalizing the converted prediction coefficients 122′ (LSF) to obtain normalized prediction coefficients 147. The converted prediction coefficients may be normalized, for example, relatively, with respect to a maximum value of a plurality of the LSF and/or absolutely, i.e. with respect to a predetermined value such as a maximum value being expected or being representable by used computation variables.
[0064] The spectral processor 145 further comprises a first determiner 145c configured for determining a bin energy for each normalized prediction parameter, i.e., to relate each normalized prediction parameter 147 obtained from the normalizer 145b to a computed to a measure 146 to obtain a vector W1 containing the bin energy for each LSF. The spectral processor 145 further comprises a second determiner 145d configured for finding (determining) a frequency weighting for each normalized LSF to obtain a vector W2 comprising the frequency weightings. The further information 114 comprises the vectors W1 and W2, i.e., the vectors W1 and W2 are the feature representing the further information 114.
[0065] The processor 142′ is configured for determining the IHM based on the converted prediction parameters 122′ and a power of the IHM, for example the second power, wherein alternatively or in addition also a higher power may be computed, wherein the IHM and the power(s) thereof form the weighting factors 142′.
[0066] A combiner 150″ is configured for determining the corrected weighting factors (corrected LSF weights) 152′ based on the further information 114 and the weighting factors 142′.
[0067] Alternatively, the processor 140′, the spectral processor 145 and/or the combiner may be implemented as a single processing unit such as a Central processing unit, a (micro-) controller, a programmable gate array or the like.
[0068] In other words, a first and a second entry to the combiner are IHM and IHM.sup.2, i.e. the weighting factors 142′. A third entry is for each LSF-vector element i:
.sub.i=(√{square root over (wfft.sub.i−min)}+2)*FreqWTable[normLsf.sub.i]
wherein wfft is the combination of W1 and W2 and wherein min is the minimum of wfft.
[0069] i=0 . . . M where M may be 16 when 16 prediction coefficients are derived from the audio signal and
wfft.sub.i=10*log.sub.10(max(binEner[[lsf.sub.i/50+0.5]−1],binEner[[lsf.sub.i/50+0.5]],binEner[[isf.sub.i/50+0.5]+1]))
wherein binEner contains the energy of each bin of the spectrum, i.e., binEner corresponds to the measure 146.
[0070] The mapping binEner [[lsf.sub.i/50+0.5]] is a rough approximation of the energy of a formant in the spectral envelope. FreqWTable is a vector containing additional weights which are selected depending on the input signal being voiced or unvoiced.
[0071] Wfft is an approximation of the spectral energy close to a prediction coefficient like a LSF coefficient. In simple terms, if a prediction (LSF) coefficient comprises a value X, this means that the spectrum of the audio signal (frame) comprises an energy maximum (formant) at the Frequency X or beneath thereto. The wfft is a logarithmic expression of the energy at frequency X, i.e., it corresponds to the logarithmic energy at this location. When compared to embodiments described before as utilizing reflection coefficients as further information, alternatively or in addition a combination of wfft (W1) and FrequWTable (W2) may be used to obtain the further information 114. FreqWTable describes one of a plurality of possible tables to be used. Based on a “coding mode” of the encoder 300, e.g., voiced, fricative or the like, at least one of the plurality of tables may be selected. One or more of the plurality of tables may be trained (programmed and adapted) during operation of the encoder 300.
[0072] A finding of using the wfft is to enhance coding of converted prediction coefficients that represent a formant. In contrast to classical noise shaping in which the noise is at frequencies comprising large amounts of (signal) energy the described approach relates to quantize the spectral envelope curve. When the power spectrum comprises a large amount of energy (a large measure) at frequencies comprising or arranged adjacent to a frequency of a converted prediction coefficient, this converted prediction coefficient (LSF) may be quantized better, i.e., with lower errors achieved by higher weightings, than other coefficients comprising a lower measure of energy.
[0073]
[0074] Alternatively or in addition, the combiner may also be configured to add further correction values (d, e, f, . . . ) and further powers of the weighting factors or of the further information. For example, the polynomial depicted in
[0075]
[0076]
[0077] To obtain the correction values during a training phase, a reference determination rule according to which reference weights are determined is selected. As the encoder is configured to correct determined weighting factors with respect to the reference weights and determination of the reference weights may be done offline, i.e. during a calibration step or the like, a determination rule comprising a high precision (e.g., low LSD) may be selected while neglecting resulting computational effort. A method comprising a high precision and maybe a high computation complexity may be selected to obtain pre-sized reference weighting factors. For example, a method to determine weighting factors according to the G.718 Standard [3] may be used.
[0078] A determination rule according to which the encoder will determine the weighting factors is also executed. This may be a method comprising a low computational complexity while accepting a lower precision of the determined results. Weights are computed according to both determination rules while using a set of audio material comprising, for example, speech and/or music. The audio material may be represented in a number of M training vectors, wherein M may comprise a value of more than 100, more than 1000 or more than 5000. Both sets of obtained weighting factors are stored in a matrix, each matrix comprising vectors that are each related to one of the M training vectors.
[0079] For each of the M training vectors, a distance is determined between a vector comprising the weighting factors determined based on the first (reference) determination rule and a vector comprising the weighting vectors determined based on the encoder determination rule. The distances are summed up to obtain a total distance (error), wherein the total error may be averaged to obtain an average error value.
[0080] During determination of the correction values, an objective may be to reduce the total error and/or the average error. Therefore, a polynomial fitting may be executed based on the determination rule shown in
[0081] The polynomial is fit to the weighting factors determined based on the determination rule, which will be executed at the decoder. The polynomial may be fit such that the total error or the average error is below a threshold value, for example, 0.01, 0.1 or 0.2, wherein 1 indicates a total mismatch. Alternatively or in addition, the polynomial may be fit such that the total error is minimized by utilizing based on an error minimizing algorithm. A value of 0.01 may indicate a relative error that may be expressed as a difference (distance) and/or as a quotient of distances. Alternatively, the polynomial fitting may be done by determining the correction values such that the resulting total error or average error comprises a value that is close to a mathematical minimum. This may be done, for example, by derivation of the used functions and an optimization based on setting the obtained derivation to zero.
[0082] A further reduction of the distance (error), for example the Euclidian distance, may be achieved when adding the additional information, as it is shown for 114 at encoder side. This additional information may also be used during calculating the correction parameters. The information may be used by combining the same with the polynomial for determining the correction value.
[0083] In other words first the IHM weights and the G.718 weights may be extracted from a database containing more than 5000 seconds (or M training vectors) of speech and music material. The IHM weights may be stored in the matrix I and the G.718 weights may be stored in the matrix G. Let I.sub.i and G.sub.i be vectors containing all IHM and G.718 weights w.sub.i of the i-th ISF or LSF coefficient of the whole training database. The average Euclidean distance between these two vectors may be determined based on:
[0084] In order to minimize the distance between these two vectors a second order polynomial may be fit:
[0085] A matrix
may be introduced and a vector P.sub.i=[p.sub.0,i p.sub.1,i p.sub.2,i].sup.T in order to rewrite:
[0086] In order to get the vector P.sub.i having the lowest average Euclidean distance the derivation
may be set to zero:
to obtain:
P.sub.i=(EI.sub.i.sup.HEI.sub.i).sup.−1EI.sub.i.sup.HG.sub.i
To further reduce the difference (Euclidean distance) between the proposed weights and the G.718 weights reflection coefficients of other information may be added to the matrix EI.sub.i.
[0087] Because, for example, the reflection coefficients carry some information about the LPC model which is not directly observable in the LSF or ISF domain, they help to reduce the Euclidean distance d.sub.i. In practice probably not all reflection coefficients will lead to a significant reduction in Euclidean distance. The inventors found that it may be sufficient to use the first and the 14th reflection coefficient. Adding the reflection coefficients the matrix EI.sub.i will look like:
where r.sub.x,y is the y-th reflection coefficient (or the other information) of the x-th instance in the training dataset. Accordingly the dimension of vector P.sub.i will comprise changed dimensions according to the number of columns in matrix EI.sub.i. The calculation of the optimal vector P.sub.i stays the same as above.
[0088] By adding further information, the determination rule depicted in
[0089]
[0090] In other words,
[0091]
[0092] In other words, the fitting model of block C is the vector P which is described above. In the following, a pseudo-code exemplarily summarizes the weight derivation processing:
TABLE-US-00001 Input: lsf = original LSF vector order = order of LPC, length of lsf parcorr[0] = − 1.sup.st reflection coefficient parcorr[1] = − 14.sup.th reflection coefficient smooth_flag= flag for smoothing weights w_past = past weights Output weights = computed weights /*Compute IHM weights*/ weights[0] = 1.f/( lsf[0] − 0 ) + 1.f/( lsf[1] − lsf[0] ); for(i=1; i<order−1; i++) weights[i] = 1.f/( lsf[i] − lsf[i−1] ) + 1.f/( lsf[i+1] − lsf[i] ); weights[order−1] = 1.f/( lsf[order−1] − lsf[order−2] ) + 1.f/( 8000 − lsf[order−1] ); /* Fitting model*/ for(i=0; i<order; i++) { weights[i] *= (8000/ PI); weights[i] = ((float)(lsf_fit_model[0][i])/(1<<12)) + weights[i]*((float)(lsf_fit_model[1][i])/(1<<14)) + weights[i]*weights[i]*((float)(lsf_fit_model[2][i])/(1<<19)) + parcorr[0]* ((float)(lsf_fit_model[3][i])/(1<<13)) + parcorr[1] * ((float)(lsf_fit_model[4][i])/(1<<10)); /* avoid too low weights and negative weights*/ if(weights[i] < 1.f/(i+1)) weights[i] = 1.f/(i+1); } wherein “parcorr” indicates the extension of the matrix EI if(smooth_flag){ for(i=0; i<order; i++) { tmp = 0.75f*weights[i] * 0.25f*w_past[i]; w_past[i]=weights[i]; weights[i]=tmp; } }
which indicates the smoothing described above in which present weights are weighted with a factor of 0.75 and past weights are weighted with a factor of 0.25.
[0093] The obtained coefficients for the vector P may comprise scalar values as indicated exemplarily below for a signal sampled at 16 kHz and with a LPC order of 16:
TABLE-US-00002 lsf_fit_model[5][16] = { {679 , 10921 , 10643 , 4998 , 11223 , 6847 , 6637 , 5200 , 3347 , 3423 , 3208 , 3329 , 2785 , 2295 , 2287 , 1743}, {23735 , 14092 , 9659 , 7977 , 4125 , 3600 , 3099 , 2572 , 2695 , 2208 , 1759 , 1474 , 1262 , 1219 , 931 , 1139}, {−6548 , −2496 , −2002 , −1675 , −565 , −529 , −469 , −395 , −477 , −423 , −297 , −248 , −209 , −160 , −125 , −217}, {−10830 , 10563 , 17248 , 19032 , 11645 , 9608 , 7454 , 5045 , 5270 , 3712 , 3567 , 2433 , 2380 , 1895 , 1962 , 1801}, {−17553 , 12265 , −758 , −1524 , 3435 , −2644 , 2013 , −616 , −25 , 651 , −826 , 973 , −379 , 301 , 281 , −165}};
[0094] As stated above, instead of the LSF also the ISF may be provided by the converter as converted coefficients 122. A weight derivation may be very similar as indicated by the following pseudo-code. ISFs of order N are equivalent to LSFs of order N−1 for the N−1 first coefficients to which we append the Nth reflection coefficients. Therefore the weights derivation is very close to the LSF weights derivation. It is given by the following pseudo-code:
TABLE-US-00003 Input: isf = original ISF vector order = order of LPC, length of lsf parcorr[0] = − 1.sup.st reflection coefficient parcorr[1] = − 14.sup.th reflection coefficient smooth_flag= flag for smoothing weights w_past = past weights Output weights = computed weights /*Compute IHM weights*/ weights[0] = 1.f/( lsf[0] − 0 ) + 1.f/( lsf[1] − lsf[0] ); for(i=1; i<order−2; i++) weights[i] = 1.f/( lsf[i] − lsf[i−1] ) + 1.f/( lsf[i+1] − lsf[i] ); weights[order−2] = 1.f/( lsf[order−2] − lsf[order−3] ) + 1.f/( 6400 − lsf[order−2] ); /* Fitting model*/ for(i=0; i<order−1; i++) { weights[i] *= (6400/PI); weights[i] = ((float)(isf_fit_model[0][i])/(1<<12)) + weights[i]*((float)(isf_fit_model[1][i])/(1<<14)) + weights[i]*weights[i]*((float)(isf_fit_model[2][i])/(1<<19)) + parcorr[0]* ((float)(isf_fit_model[3][i])/(1<<13)) + parcorr[1] * ((float)(isf_fit_model[4][i])/(1<<10)); /* avoid too low weights and negative weights*/ if(weights[i] < 1.f/(i+1)) weights[i] = 1.f/(i+1); } if(smooth_flag){ for(i=0; i<order−1; i++) { tmp = 0.75f*weights[i] * 0.25f*w_past[i]; w_past[i]=weights[i]; weights[i]=tmp; } } weights[order−1]=1;
where fitting model coefficients for input signal with frequency components going up to 6.4 kHz:
TABLE-US-00004 lsf fit model[5][15] = { {8112 , 7326 , 12119 , 6264 , 6398 , 7690 , 5676 , 4712 , 4776 , 3789 , 3059 , 2908 , 2862 , 3266 , 2740}, {16517 , 13269 , 7121 , 7291 , 4981 , 3107 , 3031 , 2493 , 2000 , 1815 , 1747 , 1477 , 1152 , 761 , 728}, {−4481 , −2819 , −1509 , −1578 , −1065 , −378 , −519 , −416 , −300 , −288 , −323 , −242 , −187 , −7 , −45}, {−7787 , 5365 , 12879 , 14908 , 12116 , 8166 , 7215 , 6354 , 4981 , 5116 , 4734 , 4435 , 4901 , 4433 , 5088}, {−11794 , 9971 , −3548 , 1408 , 1108 , −2119 , 2616 , −1814 , 1607 , −714 , 855 , 279 , 52 , 972 , −416}};
where fitting model coefficients for input signal with frequency components going up to 4 kHz and with zero energy for frequency component going from 4 to 6.4 kHz:
TABLE-US-00005 lsf fit model [5][15] = { {21229 , −746 , 11940 , 205 , 3352 , 5645 , 3765 , 3275 , 3513 , 2982 , 4812 , 4410 , 1036 , −6623 , 6103}, {15704 , 12323 ,7411 , 7416 , 5391 , 3658 , 3578 , 3027 , 2624 , 2086 , 1686 , 1501 , 2294 , 9648 , −6401}, {−4198 , −2228 , −1598 , −1481 , −917 , −538 , −659 , −529 , −486 , −295 , −221 , −174 , −84 , −11874 , 27397}, {−29198 , 25427 , 13679 , 26389 , 16548 , 9738 , 8116 , 6058 , 3812 , 4181 , 2296 , 2357 , 4220 , 2977 , −71}, {−16320 , 15452 , −5600 , 3390 , 589 , −2398 , 2453 , −1999 ,1351 , −1853 , 1628 , −1404 , 113 , −765 , −359}};
[0095] Basically, the orders of the ISF are modified which may be seen when compared the block /* compute IHN weights */ of both pseudo-codes.
[0096]
[0097] In other words, the present invention proposes a new efficient way of deriving the optimal weights w by using a low complex heuristic algorithm. An optimization over the IHM weighting is presented that results in less distortion in lower frequencies while giving more distortion to higher frequencies and yielding a less audible the overall distortion. Such an optimization is achieved by computing first the weights as proposed in [1] and then by modifying them in a way to make them very close to the weights which would have been obtained by using the G.718's approach [3]. The second stage consist of a simple second order polynomial model during a training phase by minimizing the average Euclidian distance between the modified IHM weights and the G.718's weights. Simplified, the relationship between IHM and G.718 weights is modeled by a (probably simple) polynomial function.
[0098] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
[0099] The inventive encoded audio signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[0100] Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
[0101] Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
[0102] Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
[0103] Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
[0104] In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
[0105] A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
[0106] A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
[0107] A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
[0108] A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
[0109] In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
[0110] While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
LITERATURE
[0111] [1] Laroia, R.; Phamdo, N.; Farvardin, N., “Robust and efficient quantization of speech LSP parameters using structured vector quantizers,” Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, vol., no., pp. 641,644 vol. 1, 14-17 Apr. 1991 [0112] [2] Gardner, William R.; Rao, B. D., “Theoretical analysis of the high-rate vector quantization of LPC parameters,” Speech and Audio Processing, IEEE Transactions on, vol. 3, no. 5, pp. 367,381, September 1995 [0113] [3] ITU-T G.718 “Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8-32 kbit/s”, June 2008, section 6.8.2.4 “ISF weighting function for frame-end ISF quantization