Pyramid vector quantizer shape search
11942102 ยท 2024-03-26
Assignee
Inventors
Cpc classification
G10L19/0017
PHYSICS
G10L19/00
PHYSICS
International classification
G10L19/00
PHYSICS
Abstract
An encoder and a method therein for Pyramid Vector Quantizer, PVQ, shape search, the PVQ taking a target vector x as input and deriving a vector y by iteratively adding unit pulses in an inner dimension search loop. The method comprises, before entering a next inner dimension search loop for unit pulse addition, determining, based on the maximum pulse amplitude, maxamp.sub.y, of a current vector y, whether more than a current bit word length is needed to represent enloop.sub.y, in a lossless manner in the upcoming inner dimension loop. The variable enloop.sub.y is related to an accumulated energy of the vector y. The performing of this method enables the encoder to keep the complexity of the search at a reasonable level.
Claims
1. A method performed by an audio encoder, the method comprising: receiving an input vector (S) representing an input audio signal; and enabling a decoder to produce a reconstructed vector (S{circumflex over ()}) for use in obtaining an output audio signal corresponding to the input audio signal, wherein enabling the decoder to produce the reconstructed vector (S{circumflex over ()}) comprises: using the input vector (S), obtaining a target vector (x), wherein x represents a shape of the input audio signal; using the target vector (x), generating an integer shape code vector (y) of length N, where N>0; using the integer shape code vector (y), generating a Pyramid Vector Quantizer (PVQ) index; and producing a bitstream for the decoder, wherein the bitstream includes the PVQ index and the PVQ index can be used by the decoder to produce said integer shape code vector (y), which can be used by the decoder to produce a reconstructed target vector (xq), and the reconstructed target vector (xq) can be used to produce the reconstructed vector (S{circumflex over ()}), thereby enabling the decoder to produce S{circumflex over ()}, wherein generating the integer shape code vector (y) comprises: initializing the vector y; determining a first accumulated correlation value based on vectors x and y; determining, based on a maximum absolute value of the target vector (xabs.sub.max) and the first accumulated correlation value, a first upshift value; using the first upshift value, determining a first correlation value (corr_xy_1); using corr_xy_1, determining a first best position (n.sub.best1) within the vector y; adding a first unit pulse to the vector y at position n.sub.best1; after adding the first unit pulse to the vector y at position n.sub.best1, determining a second accumulated correlation value based on vectors x and y; determining, based on the maximum absolute value of the target vector (xabs.sub.max) and the second accumulated correlation value, a second upshift value; using the second upshift value, determining a second correlation value (corr_xy_2); using corr_xy_2, determining a second best position (n.sub.best2) in the vector y for addition of a second unit pulse; and adding the second unit pulse to the vector y at position n.sub.best2.
2. The method of claim 1, wherein prior to determining the first correlation value (corr_xy_1), determining whether to represent the first correlation value using 16 bits or 32 bits.
3. The method of claim 2, wherein the method comprises determining to represent corr_xy_1 using 16 bits, and determining corr_xy_1 comprises extracting the top 16 bits from a value determined based on the first upshift value and the first accumulated correlation value.
4. The method of claim 2, wherein determining whether to represent the first correlation value using 16 bits or 32 bits comprises comparing an energy margin value to a threshold value.
5. The method of claim 1, wherein the method further comprises: after adding the first unit pulse to the vector y and before adding the second unit pulse to the vector y, determining whether the L1 norm of y satisfies a condition, wherein the step of adding the second unit pulse to the vector y is performed as a result of determining that the condition is not satisfied.
6. The method of claim 5, further comprising: wherein the conditioned is satisfied when the L1 norm of y is equal to K, where K is a predetermined value.
7. The method of claim 1, wherein determining n.sub.best1 comprises: determining whether corr_xy_1.sup.2*bestEn>BestCorrSq*enloop.sub.y, where bestEn is a best so far energy value, BestCorrSq is a best so far correlation, and enloopy is a variable related to an accumulated energy of y.
8. A computer program product comprising a non-transitory computer readable medium storing a computer program comprising instructions which, when executed by processing circuitry of an audio encoder causes the audio encoder to carry out the method of claim 1.
9. An audio encoder, the audio encoder comprising: a first determining unit coupled to a memory, the first determining unit configured to perform a process comprising: the method comprising: receiving an input vector (S) representing an input audio signal; and enabling a decoder to produce a reconstructed vector (S{circumflex over ()}) for use in obtaining an output audio signal corresponding to the input audio signal, wherein enabling the decoder to produce the reconstructed vector (S{circumflex over ()}) comprises: using the input vector (S), obtaining a target vector (x), wherein x represents a shape of the input audio signal; using the target vector (x), generating an integer shape code vector (y) of length N, where N>0; using the integer shape code vector (y), generating a Pyramid Vector Quantizer (PVQ) index; and producing a bitstream for the decoder, wherein the bitstream includes the PVQ index and the PVQ index can be used by the decoder to produce said integer shape code vector (y), which can be used by the decoder to produce a reconstructed target vector (xq), and the reconstructed target vector (xq) can be used to produce the reconstructed vector (S{circumflex over ()}), thereby enabling the decoder to produce S{circumflex over ()}, wherein generating the integer shape code vector (y) comprises: initializing the vector y; determining a first accumulated correlation value based on vectors x and y; determining, based on a maximum absolute value of the target vector (xabs.sub.max) and the first accumulated correlation value, a first upshift value; using the first upshift value, determining a first correlation value (corr_xy_1); using corr_xy_1, determining a first best position (n.sub.best1) within the vector y; adding a first unit pulse to the vector y at position n.sub.best1; after adding the first unit pulse to the vector y at position n.sub.best1, determining a second accumulated correlation value based on vectors x and y; determining, based on the maximum absolute value of the target vector (xabs.sub.max) and the second accumulated correlation value, a second upshift value; using the second upshift value, determining a second correlation value (corr_xy_2); using corr_xy_2, determining a second best position (n.sub.best2) in the vector y for addition of a second unit pulse; and adding the second unit pulse to the vector y at position n.sub.best2.
10. The audio encoder of claim 9, wherein prior to determining the first correlation value (corr_xy_1), determining whether to represent the first correlation value using 16 bits or 32 bits.
11. The audio encoder of claim 10, wherein the method comprises determining to represent corr_xy_1 using 16 bits, and determining corr_xy_1 comprises extracting the top 16 bits from a value determined based on the first upshift value and the first accumulated correlation value.
12. The audio encoder of claim 10, wherein determining whether to represent the first correlation value using 16 bits or 32 bits comprises comparing an energy margin value to a threshold value.
13. The audio encoder of claim 9, wherein the method further comprises: after adding the first unit pulse to the vector y and before adding the second unit pulse to the vector y, determining whether the L1 norm of y satisfies a condition, wherein the step of adding the second unit pulse to the vector y is performed as a result of determining that the condition is not satisfied.
14. The audio encoder of claim 13, wherein the conditioned is satisfied when the L1 norm of y is equal to K, where K is a predetermined value.
15. The audio encoder of claim 9, wherein determining n.sub.best1 comprises: determining whether corr_xy_1.sup.2*bestEn>BestCorrSq*enloop.sub.y, where bestEn is a best so far energy value, BestCorrSq is a best so far correlation, and enloopy is a variable related to an accumulated energy of y.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) The foregoing and other objects, features, and advantages of the technology disclosed herein will be apparent from the following more particular description of embodiments as illustrated in the accompanying drawings. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the technology disclosed herein.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) In floating point arithmetic there is no major issue related to establishing the dynamics of inner loop PVQ shape search iteration parameters, however in fixed precision DSPs with e.g. 16/32 bit limited accumulators (a register in which intermediate arithmetic and/or logic results are stored) and variables, it is very important to employ efficient search methods where the limited dynamic range of the DSP variables is maximized and the precision is maximized, while being able to use as many of available fast limited-dynamic range DSP operations as possible.
(12) The term precision above refers to being able to represent as small numbers as possible, i.e. the number of bits after the decimal point for a specific word length. Another way of saying it is that the precision corresponds to the resolution of the representation, which again is defined by the number of decimal or binary digits. The reason for that the precision in embodiments described below may be said to correlate with the number of bits after the decimal point and not necessarily with the word length itself is that in fixed point arithmetics, there may be different precisions for the same word length. For example, the data formats 1Q15 and 2Q14 both have word length 16, but the first one has 15 bits after the decimal point and the other 14 bits. The smallest number representable would then be 2-15 and 2-14 respectively.
(13) A way of performing pyramid vector quantization of the shape is disclosed in section 3.2 of Valin et. al., A full-bandwidth audio codec with low complexity and very low delay, EUSIPCO, 2009. In this document an MDCT codec is presented where the details, i.e. the shape, in each band are quantized algebraically using a spherical codebook and where the bit allocation is inferred from information shared between the encoder and the decoder. Aspects and embodiments of the disclosure of this application at least loosely relate to how to do a search according to Equations 4-7 in Valin et. al., in an efficient way in fixed point limited to e.g. 16/32 bit arithmetic instead of float values as in Valin et. al.
(14) In some aspects and embodiments disclosed hereinafter, given a target vector x(n) (tin Equation 0) of certain dimension N, and given a certain number of unit pulses K, the shape is analyzed and a suitable reconstruction vector x.sub.q(n)=func(y(n)), which minimizes the shape quantization error, and thus maximizes a perceived quality e.g. in case of audio coding, is determined. At least some of the aspects and embodiments are implemented to aim for a finding of the optimal constellation of K unit pulses, in a vector y(n) which needs to adhere to the L1 norm, while keeping the complexity under control, i.e. as low as practically possible.
(15) Instead of using prior art open loop methods to determine approximate values for the inner loop dynamic range and accumulator precision, some of the aspects and embodiments are designed to use low cost, in terms of DSP cycles needed and in terms of additional Program Read-Only Memory (ROM) needed, near optimal pre-analysis of the worst case numerator and/or worst case denominator before starting the costly evaluations of the PVQ-shape distortion quotient in the innermost search loop. The near-optimal pre-analysis is not targeting to scale the values to the exact optimal maximum dynamic range, but instead the pre-analysis determines the near-optimal power of 2 scale factor, as power of 2 scaling may be implemented as shifts of a binary number and such shifts have a low cost in DSP cycles and in DSP ROM.
(16) The denominator precision selection is perceptually motivated as spectrally peaky regions will be allocated more precision than flatter regions.
(17) While some of the main concepts described in this disclosure cover various modifications and alternative constructions, embodiments of the aspects are shown in drawings and exemplary code and will hereinafter be described in detail.
(18) PVQ-Search General Optimization Introduction
(19) An L1-norm structured PVQ-quantizer allows for several search optimizations, where a primary optimization is to move the target to the all positive quadrant (could also be denoted orthant or hyper octant) in N-dimensional space and a second optimization is to use an L1-norm projection as a starting approximation for y(n). An L1-norm of K for a PVQ(N,K) means that the absolute sum of all elements in the PVQ-vector y(n) has to be K, just as the absolute sum of all elements in the target shape vector x(n).
(20) A third optimization is to iteratively update Q.sub.PVQ quotient terms corr.sub.xy.sup.2 and energy.sub.y, instead of re-computing Eq. 4 (below) over the whole vector space N, for every candidate change to the vector y(n) in pursuit of reaching the L1-norm K, which is required for a subsequent indexing step.
(21) The above three major optimization steps are optimizations which generally may exist in past PVQ-implementations such as CELT and IETF-Opus, and partly in G.718, however for the completeness of the description of aspects and embodiments, these steps are also briefly outlined below.
(22) Efficient PVQ Vector Shape Search
(23) An overview of an audio encoding and decoding system applying an embodiment of the herein proposed PVQ shape search can be seen in
(24) PVQ-Search Introduction
(25) The goal of the PVQ(N,K) search procedure is to find the best scaled and normalized output vector x.sub.q(n). x.sub.q(n) is defined as:
(26)
(27) Where y=yN,K is a point on the surface of an N-dimensional hyper-pyramid and the L1 norm of yN,K is K. In other words, yN,K is the selected integer shape code vector of size N, also denoted dimension N, according to:
(28)
(29) That is, the vector x.sub.q is the unit energy normalized integer sub vector y.sub.N,K. The best y vector is the one minimizing the mean squared shape error between the target vector x(n) and the scaled normalized quantized output vector x.sub.q. This is achieved by minimizing the following search distortion:
(30)
(31) Or equivalently, by squaring numerator and denominator, maximizing the quotient Q.sub.PVQ:
(32)
(33) where corr.sub.xy is the correlation between x and y. In the search of the optimal PVQ vector shape y(n) with L1-norm K, iterative updates of the Q.sub.PVQ variables are made in the all positive quadrant in N-dimensional space according to:
corr.sub.xy(k,n)=corr.sub.xy(k?1)+1.Math.x(n)(Eq. 5)
energy.sub.y(k,n)=energy.sub.y(k?1)+2.Math.1.sup.2.Math.y(k?1,n)+1.sup.2(Eq. 6)
(34) where corr.sub.xy(k?1) signifies the correlation achieved so far by placing the previous k?1 unit pulses, and energy.sub.y(k?1) signifies the accumulated energy achieved so far by placing the previous k?1 unit pulses, and y(k?1, n) signifies the amplitude of y at position n from the previous placement of k?1 unit pulses. To further speed up the in-loop iterative processing the energy term energy.sub.y(k) is scaled down by 2, thus removing one multiplication in the inner-loop.
enloop.sub.y(k,n)=energy.sub.y(k,n).Math.enloop.sub.y(k,n)=enloop.sub.y(k?1)+y(k?1,n)+0.5(Eq. 7)
(35) where enloop.sub.y(k,n) is the preferred energy variable used and accumulated inside the innermost unit pulse search loop, as its iterative update requires one multiplication less than energy.sub.y(k,n).
(36)
(37) The best position n.sub.best for the k'th unit pulse, is iteratively updated by increasing n from 0 to N?1:
n.sub.best=n, if Q.sub.PVQ(k,n)>Q.sub.PVQ(k,k.sub.best)(Eq. 9)
(38) To avoid costly divisions, which is especially important in fixed point arithmetic, the Q.sub.PVQ maximization update decision is performed using a cross-multiplication of the saved best squared correlation numerator bestCorrSq and the saved best energy denominator bestEn so far, which could be expressed as:
(39)
(40) The iterative maximization of Q.sub.PVQ(k, n) may start from a zero number of placed unit pulses or from an adaptive lower cost pre-placement number of unit pulses, based on an integer projection to a point below the K'th-pyramid's surface, with a guaranteed undershoot of unit pulses in the target L1 norm K.
(41) PVQ Search Preparation Analysis
(42) Due to the structured nature of the y.sub.N,K PVQ integer vector, where all possible sign combinations are allowed and it is possible to encode all sign combinations, as long as the resulting vector adheres to the L1 norm of K unit pulses, the search is performed in the all positive first quadrant (the reason for the citation marks on quadrant is that a true quadrant only exists when N=2, and N may here be more than 2). Further, as realized by the inventor, to achieve as a high accuracy as possible for a limited precision implementation, the maximum absolute value xabs.sub.max of the input signal x(n) may be pre-analyzed for future use in the setup of the inner loop correlation accumulation procedure.
xabs(n)=|x(n)|, for n=0, . . . ,N?1(Eq. 11)
xabs.sub.max=max(xabs.sub.0, . . . ,xabs.sub.N-1)(Eq. 12)
(43) Handling of Very Low Energy Targets and Very Low Energy Sub-Vectors
(44) In case the input target vector (x in Eq. 3 or tin Eq. 0) is an all zero vector and/or the vector gain (e.g. G in Eq. 0) is very low, the PVQ-search may be bypassed, and a valid PVQ-vector y may be deterministically created by assigning half of the K unit pulses to the first position
(45)
and the remaining unit pulses to the last position (y[N?1]=y[N?1]+(K?[0])).
(46) The term very low energy targets and very low vector gain is in one embodiment as low as zero, as illustrated in the exemplary ANSI C-code disclosed below, where the corresponding code is: IF(L_xsum==0?neg_gain==0) {/* zero input or zero gain case */
(47) However, it may also be less than or equal to epsilon, or EPS, where EPS is the lowest value which is higher than zero and which is regarded as being worth representing in a selected precision. For example, in a precision Q15 in a signed 16 bit word, the sub-vector gain becomes less or equal to EPS 1/2{circumflex over ()}15= 1/32768 (e.g. vector gain less or equal to 0.000030517578125), and in case of precision Q12 in a signed 16 bit word for target vector x(n), then the very low value becomes EPS=(1/2{circumflex over ()}12), e.g. sum (abs (x(n))) less or equal to 0.000244140625. In one embodiment of fixed-point DSP arithmetics with 16 bit word, an unsigned integer format may take any integer value from 0 to 65546, whereas a signed integer may take the value of ?32768 to +32767. Using unsigned fractional format, the 565536 levels are spread uniformly between 0 and +1, whereas in a signed fractional format embodiment the levels would be equally spaced between ?1 and +1.
(48) By applying this optional step related to zero-vectors and low gain values, the PVQ-search complexity is reduced and the indexing complexity is spread/shared between encoder indexing and decoder de-indexing, i.e. no processing is wasted for searching a zero target vector or a very low target vector which would in any way be scaled down to zero.
(49) Optional PVQ Pre-Search Projection
(50) If the pulse density ratio K/N is larger than 0.5 unit pulses per coefficient, e.g. modified discrete cosine transform coefficient, a low cost projection to the K?1 sub pyramid is made and used as a starting point for y. On the other hand, if the pulse density ratio is less than 0.5 unit pulses per coefficient, the iterative PVQ-search will start off from 0 pre-placed unit pulses. The low cost projection to K?1 is typically less computationally expensive in DSP cycles than repeating the unit pulse inner loop search K?1 times. However, a drawback of the low cost projection is that it will produce an inexact result due to the N-dimensional floor function application. The resulting L1-norm of the low cost projection using the floor function may typically be anything between K?1 to roughly K?5, i.e. the result after the projection needs to be fine searched to reach the target norm of K.
(51) The low cost projection is performed as:
(52)
(53) If no projection is made, the starting point is an all zeroed y(n) vector. The DSP cost of the projection in DSP cycles is in the neighborhood of N (absolute sum)+25 (the division)+2N (multiplication and floor) cycles.
(54) In preparation for the fine search to reach the K'th-pyramid's surface the accumulated number of unit pulses pulse.sub.tot, the accumulated correlation corr.sub.xy(pulse.sub.tot) and the accumulated energy energy.sub.y(pulse.sub.tot) for the starting point is computed as:
(55)
(56) PVQ Fine Search
(57) The solution disclosed herein is related to the PVQ fine search (which constitutes or is part of the PVQ-shape search, as previously described). What has been described in the preceding sections is mainly prior art PVQ, except for the upfront determining of xabs.sub.max, which will be further described below. The final integer shape vector y(n) of dimension N must adhere to the L1 norm of K pulses. The fine search is assumed to be configured to start from a lower point in the pyramid, i.e. below the K'th pyramid, and iteratively find its way to the surface of the N-dimensional K'th-hyperpyramid. The K-value in the fine search can typically range from 1 to 512 unit pulses.
(58) The inventor has realized, that in order to keep the complexity of the search and PVQ indexing at a reasonable level, the search may be split into two main branches, where one branch is used when it is known that the in-loop energy representation of y(n) will stay within a signed, or unsigned, 16 bit word during a next inner search loop iteration, and another branch is used when the in-loop energy may exceed the dynamic range of a 16 bit word during a next inner search loop iteration.
(59) Fixed Precision Fine Search for a Low Number of Unit Pulses
(60) When the final K is lower than or equal to a threshold of t.sub.p=127 unit pulses, the dynamics of the energy.sub.y(K) will always stay within 14 bits, and the dynamics of the 1 bit upshifted enloop.sub.y(K) will always stay within 15 bits. This allows use of a signed 16 bit word for representing every enloop.sub.y(k) within all the fine pulse search inner loop iterations up to k=K. In other words, there will be no need for a word bit length exceeding 16 bits for representing energy.sub.y(K) or enloop.sub.y(K) in any fine pulse search inner loop iteration when K<127.
(61) In the case of the availability of efficient DSP Multiply, MultiplyAdd (multiply-add) and MultiplySubtract (multiply-subtract) operators for unsigned 16 bit variables, the threshold can be increased to t.sub.p=255, as then enloop.sub.y(K) will always stay within an unsigned 16 bit word. MultiplyAdd is here in one embodiment multiply-add instructions or equivalent operations to multiply data values representing audio and video signals by filter or transform values and accumulate the products to produce a result. MultiplySubtract operations are the same as the MultiplyAdd operations, except the adds are replaced by subtracts.
(62) In preparation for the next unit pulse addition, the near optimal maximum possible upshift of the next loop's accumulated in-loop correlation value, corr.sub.xy, in a signed 32 bit word is pre-analyzed using the previously calculated maximum absolute input value xabs.sub.max as:
corr.sub.upshift=31??log 2(corr.sub.xy(pulse.sub.tot)+2.Math.(1.Math.xabs.sub.max))?(Eq. 19)
(63) This upshift calculated in Eq 19 represents the worst case, and covers the maximum possible upshift that can be done in the next inner loop, and thus ensures that the most significant information related to correlation will not be lost, or outshifted, during the inner loop iteration, even for the worst case scenario.
(64) This worst case pre-inner loop dynamic analysis can be performed in 2-3 cycles in most DSP architectures using MultiplyAdd and Norm instructions (normalization), and the analysis is always the same independent of the dimension N. In an ITU-T G.191 virtual 16/32-bit DSP the operations in Eq. 19 become: corr_upshift=norm_I(L_mac(*L_corrxy,1, xabs_max)); with a cost of 2 cycles. It should be noted that norm_I(x) here corresponds to 31?ceil(log 2(x)), and could alternatively be denoted 31?ceil(log 2(x)), where ceil(x) is the so-called ceiling function, mapping a real number to the smallest following integer. More precisely, ceiling(x)=?x? is the smallest integer not less than x. For corr.sub.upshift, the term within the brackets with upper horizontal bar is always a positive number. The corr.sub.upshift could alternatively be calculated using a floor function as:
corr.sub.upshift=30??log 2(corr.sub.xy(pulse.sub.tot)+2.Math.(1.Math.xabs.sub.max))?
where floor(x)=?x? is the largest integer not greater than x.
(65) Another benefit of the herein suggested approach to near optimal shape search correlation scaling is that the proposed method does not require a pre-normalized target vector x, which will save some additional complexity before starting the shape search.
(66) To make the iterative Eq. 10 update as efficient as possible, the corr.sub.xy(k,n).sup.2 numerator may be represented by a 16 bit signed word, even when comprising more information than fits in a 16 bit word, by the following approach:
(67)
(68) where the function Round.sub.16 extracts the top 16 bits of a signed 32 bit variable with rounding. This near optimal upshift (Eq. 10) and the use of 16 bit representation of the squared correlation bestCorrSq.sub.16 enables a very fast inner-loop search using only ?9 cycles for performing the Eq. 21 test and the three variable updates, when using a DSPs optimized Multiply, MultiplyAdd, MultiplySubtract functions.
(69) The location of the next unit pulse in the vector y is now determined by iterating over the n=0, . . . , N?1 possible positions in vector y, while employing equations Eq 20, Eq 6 and Eq 21.
(70) When the best position n.sub.best for the unit pulse (in the vector y achieved so far) has been determined, the accumulated correlation corr.sub.xy(k), the accumulated inloop energy enloop.sub.y(k) and the number of accumulated unit pulses pulse.sub.tot are updated. If there are further unit pulses to add, i.e. when pulse.sub.tot<K, a new inner-loop is started with a new near optimal corr.sub.upshift analysis (Eq. 19) for the addition of a next unit pulse.
(71) In total, this suggested approach has a worst case complexity for each unit pulse added to y(n) of roughly 5/N+15 cycles per quantized coefficient. In other words, a loop over a vector of size N for adding a unit pulse has a worst case complexity of about N*(5/N+15) cycles, i.e. 5+15*N cycles.
(72) Fixed Precision Fine Search for a High Number of Unit Pulses
(73) When K is higher than a threshold t.sub.p, which in this exemplifying embodiment assuming a 16/32 bit restricted DSP, is t.sub.p=127 unit pulses, the dynamics of the parameter energy.sub.y(K) may exceed 14 bits, and the dynamics of the 1 bit upshifted enloop.sub.y(K) may exceed 15 bits. Thus, in order not to use unnecessarily high precision, the fine search is configured to adaptively choose between 16 bit representation and 32 bit representation of the pair {corr.sub.xy(k,n).sup.2, enloop.sub.y(k,n)} when K is higher than t.sub.p. When K for the vector y(n) is known to end up in a final value higher than 127 in advance, the fine search will keep track of the maximum pulse amplitude maxamp.sub.y in y achieved so far. This may also be referred to as that maxamp.sub.y is determined. This maximum pulse amplitude information is used in a pre-analysis step before entering the optimized inner dimension loop. The pre-analysis comprises determining of what precision should be used for the upcoming unit pulse addition inner-loop. As shown in
(74) TABLE-US-00001 bits required for PVQ(N, K) N = 8 N = 16 N = 32 K = 4 11.4594 15.4263 19.4179 K = 5 13.2021 18.1210 23.1001 K = 6 14.7211 20.5637 26.5222 K = 7 16.0631 22.7972 29.7253
(75) For example, a stored table as the one shown above may be used to determine or select a value of K. If the dimension N is 8 and the available bits for the band bits(band) is 14.0, then K will be selected to be 5, as PVQ(N=8,K=6) requires 14.7211 bits which is higher than the number of available bits 14.0.
(76) If the pre-analysis indicates that more than a signed 16 bit word is needed to represent the in-loop energy without losing any energy information, a higher precision and computationally more intensive high precision unit pulse addition loop is employed, where both the saved best squared correlation term and the saved best accumulated energy term are represented by 32 bit words.
en.sub.margin=31??log 2((1+energy.sub.y(pulse.sub.tot)+2.Math.(1.Math.maxamp.sub.y)))?(Eq. 22)
highprecision.sub.active=FALSE, if (en.sub.margin?16)
highprecision.sub.active=TRUE, if (en.sub.margin<16)(Eq. 23)
(77) The worst case pre-inner loop dynamic analysis can be performed in 5-6 additional cycles in most DSP's, and the analysis cost is the same for all dimensions. In an ITU-T G.191 STL 2009 virtual 16/32 bit DSP the operations in Eq. 22 and Eq 23 becomes:
(78) TABLE-US-00002 L_energy_y= L_add(L_energy_y, 1);/* 0.5 added*/ en_margin= norm_I(L_mac(L_energy_y, 1, maxamp_y)); highprecision_active= 1; move16( ); if(sub(16,en_margin <= 0){ highprecision_active = 0; move16( ); },
with a cost of maximum 6 cycles.
(79) The corresponding code in an ANSI-C code example below is:
(80) TABLE-US-00003 L_yy= L_add(L_yy,1);/* .5 added *1 en_margin = norm_l(L_mac(L_yy,1, max_amp_y)); /*find max addition, margin,~2 ops */ en_dn_shift = sub(16, en_margin); /* calc. shift to lower word */ high_prec_active = 1;move16( ); if( en_dn_shift <= 0 ){ /* only use 32 bit energy if actually needed */ high_prec_active = 0; move16( ); }
(81) Alternatively the energy margin en_margin in Eq (22) could in line with an operation of the G.191 STL function norm_I( ) be calculated using the floor function as:
en.sub.margin=30??log 2((1+energy.sub.y(pulse.sub.tot)+2.Math.(1.Math.maxamp.sub.y)))?
(82) If highprecision.sub.active is FALSE, i.e. =0, the lower precision inner search loop in Eq 20, Eq 6 and Eq 21 is employed, on the other hand, when highprecision.sub.active is TRUE, i.e. =1, the location of the next unit pulse is performed employing a higher precision inner loop, representing enloop.sub.y and corr.sub.xy.sup.2 with 32 bit words in this example. That is, when highprecision.sub.active is TRUE, the location of the next unit pulse in y(n) is determined by iterating over the n=0, . . . , N?1 possible positions, using equations Eq 24, Eq 6 and Eq 25.
(83)
(84) In other words, en_margin is indicative of how many upshifts that can be used to normalize the energy in the next loop. If 16 or more upshifts can be used, then the energy stays in the lower word length, assuming 16/32 bit word lengths, and there is no need for the high precision (32 bit representation) loop, so highprecision.sub.active is set to FALSE. One implementation reason for doing it in this way (allowing the energy information to stay in the low part of the L_energy 32 bit word) is that it is computationally cheaper: it costs only 1 cycle to compute extract_I(L_energy) whereas an alternative round_fx(L_shl(L_energy,en_margin)) takes two cycles.
(85) When the best position n.sub.best of the unit pulse has been determined, the accumulated correlation corr.sub.xy(k), the accumulated inloop energy enloop.sub.y(k) and the number of accumulated unit pulses pulse.sub.tot are updated. Further, the maximum amplitude maxamp.sub.y in the best integer vector y so far, is kept up to date, i.e. determined, for the next unit pulse addition loop.
maxamp.sub.y=max(maxamp.sub.y,y[n.sub.best])(Eq. 26)
(86) If there are further unit pulses to add, i.e. when pulse.sub.tot<K, a new inner-loop is started with a new near optimal corr.sub.upshift analysis Eq. 19 and a new energy precision analysis Eq 22 and Eq 23, and then commencing the next unit pulse loop with equations Eq. 24, Eq. 6 and Eq. 26.
(87) The high precision approach (in this example 32 bit words) worst case complexity for each unit pulse added to y(n) is roughly 7/N+31 cycles per quantized coefficient.
(88) The effect of the in-loop accumulated energy based inner loop precision selection is that target sub vectors that have a high peakiness, or have very fine granularity, i.e. the final K is high, will be using the higher precision loop and more cycles more often, while non-peaky or low pulse granularity sub vectors will more often use the lower precision loop and fewer cycles.
(89) It should be noted that the analysis described in the above section could be performed also when K<t.sub.p. However, an embodiment may be made more efficient by the introduction of a threshold t.sub.p for applying the above analysis.
(90) PVQ Vector Finalization and Normalization
(91) After shape search, each non-zero PVQ-vector element is assigned its proper sign and the vector is L2-normalized (a.k.a. Euclidean normalization) to unit energy. Additionally, if the band was split, it is further scaled with a sub vector gain.
(92)
(93) Above, two precision methodologies were presented and specified:
(94) En16?CorrSq16, as defined in section above, (equations 19 thorough 21) and En32?CorrSq32, (equations 22 through 26). Two further medium complexity methods where the precision of the numerator Correlation Squared term and the Energy term are varied are described below.
(95) En16?CorrSq32 and En32?CorrSq16 Methods
(96) The En16?CorrSq32 method is similar to the En32?CorrSq32, but with the difference that the inner loop best found unit pulse update and comparison uses a 16 bit representation of the best Energy bestEn.sub.16 so far, according to:
(97)
(98) The approximate cost of the En16?CorrSq32 method per unit pulse is 5/N+21 cycles.
(99) The En32?CorrSq16 method is similar to the En32?CorrSq32, but with the difference that the inner loop best found unit pulse update and comparison uses a 16 bit representation of the best squared correlation bestCorrSq.sub.16 so far, according to:
(100)
(101) The approximate cost of the En32?CorrSq16 method per unit pulse addition is 6/N+20 cycles per coefficient.
ASPECTS AND EXEMPLIFYING EMBODIMENTS
(102) Below, some exemplifying embodiments of the solution disclosed herein will be described with reference to
(103)
(104) In the method illustrated in
(105) The pre-analysis described above is performed before each entry 102 to the inner loop, i.e. before each addition of a unit pulse to the vector y. In an exemplifying embodiment where only two different bit representations, i.e. bit word lengths such as 16 and 32 bits, are available, the inner loop will be performed using a 16 bit representation of enloop.sub.y until it is determined that a longer bit word is needed to represent enloop.sub.y, after which the higher bit word length, i.e. the 32 bit representation will be applied for the inner loop calculations. The loop using a 16 bit representation may be referred to as a low precision loop, and the loop using a 32 bit representation may be referred to as a high precision loop.
(106) The determining 102 of whether more than an initial or current bit word length is needed could alternatively be expressed as that it is determined which bit word length, out of at least two different, alternative, bit word lengths, that will be required for representing the worst case (largest possible increase) enloop.sub.y during the next inner loop. The at least two different word bit lengths could comprise at least e.g. 16 and 32 bit word lengths.
(107) In other words, when more than a current bit word length is determined 102 to be needed to represent enloop.sub.y in the next inner loop, the inner loop calculations are performed 103 with a longer bit word length, than an initial or current bit word length, for representing enloop.sub.y in the inner loop. On the other hand, when more than a current bit word length is determined not to be needed to represent enloop.sub.y, the inner loop calculations may be performed by employing a first unit pulse addition loop using a first or current bit word length to represent enloop.sub.y, i.e. the current bit word length may continue to be used. This is also illustrated e.g. in
(108) The method builds on the realization that the maximum possible increase of an energy variable, such as enloop.sub.y, in a next inner loop will occur when the unit pulse is added to the position in y associated with the current maxamp.sub.y. Having realized this, it is possible to determine, before entering the inner loop, whether there is a risk for exceeding the representation capacity of the currently used bit word length, e.g. 16 bits, during the next inner loop, or not. In other words, the determining of whether more than a current bit word length is needed to represent enloop.sub.y comprises determining characteristics of the case when, in the upcoming inner search loop, the unit pulse is added to the position in y being associated with maxamp.sub.y. For example, the number of bits needed to represent enloop.sub.y in the upcoming inner loop may be determined, or, alternatively, a remaining margin in a bit word representing enloop.sub.y in the upcoming inner loop.
(109) For target shape vectors being associated with a low K, it is possible to say in advance that there will be no need for a longer bit word length than the one offered by the initial and currently used bit word length. Therefore, it would be possible to apply a threshold value Tk, such that certain operations are performed only for target shape vectors being associated with a K which exceeds the threshold value Tk. For such target vectors, the encoder will keep track of maxamp.sub.y, by updating this value after each pulse addition. For target vectors associated with a K which is lower than the threshold value, it is not necessary to keep track of maxamp.sub.y. For the example with 16 and 32 bit words, a possible Tk would be 127, as previously described. In other words, the keeping track of maxamp.sub.y and the determining of whether more than a current bit word length is needed is performed, e.g., only when a final value of K associated with the input target shape vector exceeds a threshold value Tk.
(110) An embodiment illustrated in
(111) When more than a current bit word length is determined to be needed to represent enloop.sub.y, the inner loop calculations may be performed using a longer bit word length (than the current bit word length, e.g. 32 instead of 16 bits) to represent enloop.sub.y.
(112) In one embodiment, when more than a current bit word length is determined to be needed to represent enloop.sub.y, the inner loop calculations are performed with a longer bit word length (than the current bit word length), representing also an accumulated in-loop correlation value, corr.sub.xy.sup.2, in the inner loop. This is illustrated e.g. in
(113) As previously mentioned, it is preferred to avoid performing the division of Eq 8 in the inner dimension search loop for unit pulse addition. Therefore, a cross-multiplication may be performed, as illustrated in Eq 10. That is, a position, n.sub.best, in y for addition of a unit pulse, could be determined by evaluating a cross-multiplication, for each position n in y, of a correlation and energy value for the current n; and a best so far correlation, BestCorrSq, and a best so far energy value bestEn, saved from previous values of n, as:
(114)
(115) The position n.sub.best could be referred to as a best position in y for addition of a unit pulse. It should be noted that ? could be used in the expressions above instead of >. However, >, i.e. larger than may be preferred when trying to keep the computational cost as low as possible, e.g. in regard of number of cycles. The performing of the method according to any of the embodiments described above enables this cross-multiplication to be performed in an efficient manner (e.g. by not using a higher precision than actually needed).
IMPLEMENTATIONS
(116) The methods and techniques described above may be implemented in an encoder or codec, which may be comprised in e.g. in a communication device.
(117) Encoder,
(118) An exemplifying embodiment of an encoder is illustrated in a general manner in
(119) The encoder may be implemented and/or described as follows:
(120) The encoder 1100 is configured for Pyramid Vector Quantization, including so-called fine search or fine shape search, where a Pyramid Vector Quantizer, PVQ, is configured to take a target vector x as input and derives a vector y by iteratively adding unit pulses in an inner dimension search loop. The input vector x has a dimension N and an L1-norm K. The encoder 1100 comprises processing circuitry, or processing means 1101 and a communication interface 1102. The processing circuitry 1101 is configured to cause the encoder 1100 to, before entering a next inner dimension search loop for unit pulse addition: determine, based on a maximum pulse amplitude, maxamp.sub.y, of a current vector y, whether more than a current bit word length is needed to represent, in a lossless manner, a variable, enloop.sub.y, related to an accumulated energy of y, in the upcoming inner dimension loop. The communication interface 1102, which may also be denoted e.g. Input/Output (I/O) interface, includes an interface for sending data to and receiving data from other entities or modules.
(121) The processing circuitry 1101 could, as illustrated in
(122) An alternative implementation of the processing circuitry 1101 is shown in
(123) The encoders described above could be configured for the different method embodiments described herein, such as e.g. to perform the inner loop calculations using a longer bit word representing enloop.sub.y and possibly corr.sub.xy.sup.2, when more than a current bit word length is determined to be needed to represent enloop.sub.y. Longer, here refers to longer than a current or initial bit word length.
(124) The encoder 1100 may be assumed to comprise further functionality, for carrying out regular encoder functions.
(125) The encoder described above may be comprised in a device, such as a communication device. The communication device may be a user equipment (UE) in the form of a mobile phone, video camera, sound recorder, tablet, desktop, laptop, TV set-top box or home server/home gateway/home access point/home router. The communication device may in some embodiments be a communications network device adapted for coding and/or transcoding. Examples of such communications network devices are servers, such as media servers, application servers, routers, gateways and radio base stations. The communication device may also be adapted to be positioned in, i.e. being embedded in, a vessel, such as a ship, flying drone, airplane and a road vehicle, such as a car, bus or lorry. Such an embedded device would typically belong to a vehicle telematics unit or vehicle infotainment system.
(126) The steps, functions, procedures, modules, units and/or blocks described herein may be implemented in hardware using any conventional technology, such as discrete circuit or integrated circuit technology, including both general-purpose electronic circuitry and application-specific circuitry.
(127) Particular examples include one or more suitably configured digital signal processors and other known electronic circuits, e.g. discrete logic gates interconnected to perform a specialized function, or Application Specific Integrated Circuits (ASICs).
(128) Alternatively, at least some of the steps, functions, procedures, modules, units and/or blocks described above may be implemented in software such as a computer program for execution by suitable processing circuitry including one or more processing units. The software could be carried by a carrier, such as an electronic signal, an optical signal, a radio signal, or a computer readable storage medium before and/or during the use of the computer program in the communication device.
(129) The flow diagram or diagrams presented herein may be regarded as a computer flow diagram or diagrams, when performed by one or more processors. A corresponding apparatus may be defined by a group of function modules, where each step performed by the processor corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor. It is to be understood that the function modules do not have to correspond to actual software modules.
(130) Examples of processing circuitry includes, but is not limited to, one or more microprocessors, one or more Digital Signal Processors, DSPs, one or more Central Processing Units, CPUs, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays, FPGAs, or one or more Programmable Logic Controllers, PLCs. That is, the units or modules in the arrangements in the different devices described above could be implemented by a combination of analog and digital circuits, and/or one or more processors configured with software and/or firmware, e.g. stored in a memory. One or more of these processors, as well as the other digital hardware, may be included in a single application-specific integrated circuitry, ASIC, or several processors and various digital hardware may be distributed among several separate components, whether individually packaged or assembled into a system-on-a-chip, SoC.
(131) It should also be understood that it may be possible to re-use the general processing capabilities of any conventional device or unit in which the proposed technology is implemented. It may also be possible to re-use existing software, e.g. by reprogramming of the existing software or by adding new software components.
Further Exemplifying Embodiments
(132) Expressed in a slightly different manner, the disclosure herein relates to, for example, the following aspects and embodiments.
(133) One of the aspects is an encoder/codec, wherein the encoder/codec is configured to perform one, more than one or even all of the following steps, illustrated e.g. in
(134) The second loop may be a higher precision and computationally more intensive high precision unit pulse loop than the lower precision (i.e. in relation to the second loop) first loop. The inloop accumulated energy based selection of the inner loop precision has the effect that target sub vectors that have a high peakiness, or have very fine granularity (final K is high) will or could be using the higher precision loop and more cycles more often, while non-peaky or low pulse granularity sub vectors will or could more often use the lower precision loop and fewer cycles.
(135) One aspect relates to a communication device 1, illustrated in
(136) The encoder or codec may be fully or partially implemented as a DSP positioned in the communication device. In one first embodiment the encoder/codec is configured to make a PVQ-shape search based on a target sub vector (x(n)), the number of finite unit pulses (K), a sub vector dimension value (N) of the target sub vector and optionally also one or more gain values (g.sub.sub). The encoder or codec may also be configured to make a PVQ band split, and in such a case the PVQ-shape search would also be based on a number/value of sub vectors of a band (N.sub.S) and a largest gain of a gain vector G, (g.sub.max=max (G)=max (g.sub.o . . . g.sub.(N.sub.
(137) The encoder/codec/communication device is configured to perform the PVQ-shape search, wherein the encoder/codec/communication device is configured to: determine, calculate or obtain (S1, S23) a maximum absolute value (xabs.sub.max) of the input (target) vector (x(n)), e.g. according to equations 11 and 12 above, determine, calculate or obtain (S2,S28) a possible upshift of a correlation value based at least on the maximum absolute value (xabs.sub.max), e.g. by calculating the possible upshift of a next loop's accumulated in-loop correlation value in a signed 32-bit word through the equation 19 above, if the number of final unit pulses (K) will end up higher than a threshold (t.sub.p), which for example may be 127 unit pulses, keep track of/store (S30) a maximum pulse amplitude (maxamp.sub.y) value/information calculated e.g. according to equation 26 above of a vector (y(n)), which may be defined according to equations 13 and 14 above, and determine/calculate/decide/select (S3, S32) based on the stored maximum pulse amplitude, e.g. through a calculation in accordance with equations 22 and 23 above, if more than a certain word length is needed or should be used, e.g. more than a signed 16 bit word or more than a signed 32 bit word, to represent in-loop energy, represent (S34) a best squared correlation term/parameter/value and a best accumulated energy term/parameter/value by more than the certain word length, e.g. 32 bit words or 64 bit words, if more than the certain word length is needed, and if less than the certain word length is determined, run (S33) a first loop, if more than the certain word length is determined, run (S35) a second, alternative loop with the best accumulated energy term and best squared correlation term represented by the more than the certain word length words.
(138) The above PVQ-shape search, which may be a limited precision PVQ-shape search, is in one embodiment performed by a vector quantizer, which is a part of the encoder/codec and may be implemented at least partly, but also fully as a DSP unit, which may be positioned in or adapted to be positioned in a communication device. Thus the encoder/codec may be fully or partly implemented as a hardware unit, e.g. a DSP or a programmable-field gate array (FPGA). It may however in alternative embodiments be implemented with the help of a general purpose processor and a codec computer program which when run on the general purpose processor causes the communication device to perform one or more of the steps mentioned in the paragraph above. The processor may also be a Reduced Instruction Set Computing (RISC) processor.
(139) Another aspect of the disclosure herein is, as indicated in the paragraph above, a computer program 6 illustrated in
(140) Yet another aspect is a PVQ-shape search method performed by a communication device/codec/encoder, wherein the method comprises one or more of the following steps: determining, calculating or obtaining (S1) a maximum absolute value (xabs.sub.max) of the input (target) vector (x(n)), e.g. according to equations 11 and 12 above, determining, calculating or obtaining (S2, S28) a possible upshift of a correlation value based at least on the maximum absolute value (xabs.sub.max), e.g. by calculating the possible upshift of a next loop's accumulated in-loop correlation value in a signed 32-bit word through the equation 19 above, if the number of final unit pulses (K) will end up higher than a threshold (t.sub.p), which for example may be 127 unit pulses, keep track of/store a maximum pulse amplitude (maxamp.sub.y) value/information calculated e.g. according to equation 26 above of a vector (y(n)), which may be defined according to equations 13 and 14 above, and determining/calculating/deciding/selecting (S3) based on the stored maximum pulse amplitude, e.g. through a calculation in accordance with equations 22 and 23 above, if more than a certain word length is needed or should be used, e.g. more than a signed 16 bit word or more than a signed 32 bit word, to represent in-loop energy, representing a best squared correlation term/parameter/value and a best accumulated energy term/parameter/value by more than the certain word length, e.g. 32 bit words or 64 bit words, if more than the certain word length is needed, and if less than the certain word length is determined, running a first loop, if more than the certain word length is determined, running a second, alternative loop with the best accumulated energy term and best squared correlation term represented by the more than the certain word length words.
(141) The communication device may be a user equipment (UE) in the form of a mobile phone, video camera, sound recorder, tablet, desktop, laptop, TV set-top box or home server/home gateway/home access point/home router, etc. as defined above.
(142) Still another aspect is a computer readable storage medium 5 (see
(143) An embodiment of the communication device 1 is illustrated in
(144) The units mentioned in the paragraph above may be comprised in a codec/encoder 2 in the form of a DSP in the communication unit and may furthermore be comprised in a hardware vector quantizer of the DSP. In an alternative embodiment, all the units in the paragraph above are implemented in the communication device as software.
(145) As further illustrated in
(146) In the case of a software implementation in a communication device, an embodiment of the communication device 1 may be defined as a communication device comprising a processor 4 and a computer program storage product 5 in the form of a memory, said memory containing instructions executable by said processor, whereby said communication device is operative to perform one, more than one or all of the following: determine, calculate or obtain a maximum absolute value (xabs.sub.max) of an input (target) vector (x(n)), e.g. according to equations 11 and 12 above, determine, calculate or obtain a possible upshift of a correlation value based at least on the maximum absolute value (xabs.sub.max), e.g. by calculating the possible upshift of a next loop's accumulated in-loop correlation value in a signed 32-bit word through the equation 19 above, if the number of final unit pulses (K) will end up higher than a threshold (t.sub.p), which for example may be 127 unit pulses, keep track of/store a maximum pulse amplitude (maxamp.sub.y) value/information calculated e.g. according to equation 26 above of a vector (y(n)), which may be defined according to equations 13 and 14 above, and determine/calculate/decide/select based on the stored maximum pulse amplitude, e.g. through a calculation in accordance with equations 22 and 23 above, if more than a certain word length is needed or should be used, e.g. more than a signed 16 bit word or more than a signed 32 bit word, to represent in-loop energy, represent a best squared correlation term/parameter/value and a best accumulated energy term/parameter/value by more than the certain word length, e.g. 32 bit words or 64 bit words, if more than the certain word length is needed, and if less than the certain word length is determined, run a first loop, if more than the certain word length is determined, run a second, additional loop with the best accumulated energy term and best squared correlation term represented by the more than the certain word length words.
(147) To further illustrate aspects and embodiments, some of them are in the following going to be described in conjunction with
(148)
(149)
(150) Shape target sub vectors, optionally from step S21, are received in a second step S22, wherein, in dependence of embodiment, also g.sub.sub, g.sub.max and N.sub.S may be received.
(151) In a third step S23, which corresponds to step S1 in
(152) In an optional fourth step S24, it is determined whether the value of the target vector is equal to or below a first threshold. The threshold is set to filter out target vectors which are considered to have very low energy values. As explained above, the threshold could be set to be equal to zero in one embodiment. It could in this fourth step also be decided if a sub vector gain is equal to or below a second threshold. In one embodiment the second threshold is set to zero, but may in other embodiments be set to be the Machine Epsilon in dependence of the precision used for processed words.
(153) If it in the fourth step S24 is determined that the target vector is equal or below the first threshold and/or the sub vector gain is below or equal to the second threshold, then a PVQ-vector is created in an optional fifth step S25. The creation is in one embodiment deterministically created by assigning half of the K unit pulses to a first position
(154)
and the remaining unit pulses to a last position (y[N?1]=y[N?1]+(K?y[0])). This step could in conjunction with the fourth step S24 be seen as bypassing the whole actual PVQ-shape search, but can also be seen as a sub-routine within the context of a general PVQ-shape search procedure.
(155) In an optional sixth step S26, an initial value (starting point) for y, y_start, is set for the PVQ-shape search to follow, wherein the initial value is dependent on the ratio between K and N. If the ratio is larger than a third threshold value, which may be 0.5 unit pulses per coefficient, a first projection to a K?1 sub pyramid is used as the initial vector_y_start in a following step. The first projection may be calculated as in equations 13 and 14 above. If lower than the third threshold, then the initial vector y_start is decided to start off from 0 pre-placed unit pulses.
(156) In preparation for subsequent PVQ-shape search steps, all the initial vector values in y_start is set to zero in a seventh step S27. In this step a first parameter, here called the accumulated number of unit pulses, pulse.sub.tot, and a second parameter, here the accumulated correlation, corr.sub.xy(pulse.sub.tot), and a third parameter, here called the accumulated energy energy.sub.y(pulse.sub.tot) for the starting point are computed, e.g. according to equations 15-17 respectively. A fourth parameter, here called enloop.sub.y (pulse.sub.tot) may also be calculated in this step according to equation 18 above.
(157) In an eighth step S28, a PVQ-shape search is started, or in an alternative way of looking at it, the second, fine search part of the PVQ-shape search is started for remaining unit pulses up to K with the help of previously obtained, determined or calculated K, N, X_abs, max_xabs, and y, and in some embodiments also g.sub.sub, g.sub.max and N.sub.S. Detailed steps of some embodiments of this fine search are thoroughly illustrated by e.g.
(158) In a ninth step S29, which may be said to be a part of the fine PVQ-shape search, it is determined whether the number of final unit pulses K will end up higher than a third threshold, t.sub.p, for the number of final unit pulses. If this is the case, then in a tenth step S30, the maximum pulse amplitude maxamp.sub.y is stored.
(159) In an eleventh step S31, a sixth parameter, en.sub.margin, is calculated according to e.g. equation 22.
(160) In a twelfth step S32, the sixth parameter is compared with fourth threshold value, which corresponds to a certain word length.
(161) If the answer YES (in S32
(162) If the answer is No (in S32
(163) In a sixteenth step S36, at least each non-zero PVQ-sub vector element is assigned its proper sign and the vector is L2-normalized to unit energy. If, in some embodiments, a band has been split, then it is scaled with a sub-vector gain g.sub.sub. A normalized x.sub.q may also be determined based on equation 28. An exemplary procedure for this step is more thoroughly described above.
(164) In a seventeenth step S37, the normalized x.sub.q and y are output from the PVQ-shape search process and forwarded to a PVQ-indexing process included in e.g. the codec.
Some Advantages of Embodiments and Aspects
(165) Below are some advantages over prior art enabled at least some of the aspects and embodiments disclosed above.
(166) The proposed correlation scaling method/algorithm using a pre-analysis of the current accumulated maximum correlation, improves the worst case (minimum) SNR performance of a limited precision PVQ-shape quantization search implementation. The adaptive criterion for up-front correlation margin analysis requires very marginal additional complexity. Further no costly pre-normalization of the target vector x to e.g. unit energy is required.
(167) The adaptive criterion using tracking of the maximum pulse amplitude in the preliminary result, followed by a pre-analysis of the worst case accumulated energy, for e.g. the soft 16/32 bit precision inner-loop decision requires very little additional computational complexity and provides a good trade-off where the complexity may be kept low while high precision correlation and high precision energy metrics are still used for relevant input signals, and further subjectively important peaky signals will be assigned more precision. In other words, at least some of the embodiments and aspects improve the functioning of a computer/processor itself.
(168) In Tables 2/3 above in appendix 2 below, one can find that an example PVQ-based system using the adaptive precision logic cost will be 6.843 WMOPS, if one would use 32 bit energy and squared correlation precision in all (any K) inner search loops the cost is raised to 10.474 WMOPS.
(169) Concluding Remarks
(170) The embodiments described above are merely given as examples, and it should be understood that the proposed technology is not limited thereto. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the present scope. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.
(171) When using the word comprise or comprising it shall be interpreted as non-limiting, i.e. meaning consist at least of.
(172) It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts.
(173) It is to be understood that the choice of interacting units, as well as the naming of the units within this disclosure are only for exemplifying purpose, and nodes suitable to execute any of the methods described above may be configured in a plurality of alternative ways in order to be able to execute the suggested procedure actions.
(174) It should also be noted that the units described in this disclosure are to be regarded as logical entities and not with necessity as separate physical entities.
(175) Reference to an element in the singular is not intended to mean one and only one unless explicitly so stated, but rather one or more. All structural and functional equivalents to the elements of the above-described embodiments that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed hereby. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the technology disclosed herein, for it to be encompassed hereby.
(176) In some instances herein, detailed descriptions of well-known devices, circuits, and methods are omitted so as not to obscure the description of the disclosed technology with unnecessary detail. All statements herein reciting principles, aspects, and embodiments of the disclosed technology, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, e.g. any elements developed that perform the same function, regardless of structure.
(177) Exemplary implementation of embodiment in ANSI-C code (appendix 1)
(178) Below is an example of an implementation of an exemplifying embodiment in ANSI C-code using STL 2009 G.191 virtual 16/32 bit (a simulation of a DSP).
(179) The above code should be easy to read for all persons skilled in the art and should not have to be explained more in detail. However, for the non-skilled person it is mentioned that the relational operator == is an operator which in an example of A==B returns a logical value set to logical 1 (true) when values A and B are equal; and otherwise returns logical 0 (false). L_mac is a multiply-accumulate within the meaning that L_mac (L_v3, v1, v2)=L_v3+v1*v2.
(180) Tabled Simulation Results (Appendix 2)
(181) Simulation Background
(182) Embodiments of the disclosure herein have been simulated. For all PVQ-shape-search simulations made, the bit rate used was 64000 bps, and the codec was operated in MDCT mode, with initial MDCT coefficient sub-band sizes of [8, 12, 16, 24, 32] coefficients. These bands may very well be divided into smaller band sections, each represented by a sub vector, by a PVQ band splitting-algorithm. For example, a band of size 8 may be split into smaller sub-section, e.g. 4, 4 or 3,3,2, if it is allocated enough bits. Typically, each band is split in such a way that a maximum of 32 bits may be used for shape coding of every final sub-vector.
(183) In this PVQ-indexing implementation a band of size 8 may have up to 36 unit pulses, a sub section of size 7 may have up to 53 unit pulses, a section of size 6 may have up to 95 unit pulses, a section of size 5 may have up to 238 unit pulses, a section of size 4, 3, 2, 1 may have up to 512 unit pulses. As the shorter sections with a high number of pulses are created dynamically by band-splitting, they are more infrequent than the longer sub vector sizes. The WMOPS figures in the Result Tables below include: PVQ-pre-search, PVQ-fine search, PVQ-normalization, and PVQ-indexing. The % identical figures in the Result Tables below, is the number of identical vectors found in the evaluated limited precision shape search Algorithm, compared to an unconstrained floating point PVQ shape search.
(184) Result Tables
(185) TABLE-US-00004 TABLE 1 Results for final K <= 127 Pulses <= 127, Algorithm Min Seg- % Worst En{energy-bits} ? SNR SNR identical Case Average CorrSq{corrSq-bits} (dB) (dB) vectors WMOPS WMOPS Remark Mixed 4.771 188.803 99.3 6.843 5.496 16 ? 16 always En16 ? CorrSq16/En32 ? used, WC CorrSq32, (worst case) in pre_analyze max(x_abs) 16 ? 16 Locked En16 ? CorrSq16 4.771 188.803 99.3 6.843 No change as pre_analyze max(x_abs) energy never exceeds 16 bits En16 ? CorrsSq16 using a ?6.021 180.556 94.6 6.826 5.476 Algorithm is bit known art correlation scaling worse (lower method OPUS, using minSNR accumulated number of unit less identical hits,) pulses. at very similar complexity Locked En32 ? CorrSq16, 4.771 188.803 99.3 8.970 6.961 Unnecessary to pre-analyze max(x_abs) Use En32 for pulses <= 127, as energy never exceeds 16 bits dynamics Locked En16 ? CorrSq32, 190.0 190.0 100 9.386 7.248 2.5 WMOPS extra pre-analyze input max(x_abs) required for the last 0.7% hits Locked En32 ? CorrSq32, 190.0 190.0 100 10.474 7.999 Unnecessary 0.9 pre-analyze max(x_abs) WMOPS increase compared to Locked En16 ? CorrSq32,
(186) TABLE-US-00005 TABLE 2 Results for K > 127 Pulses > 127 Algorithm Worst En{energy-bits} ? minSNR segSNR % identical Case- Average- CorrSq{corrSq-bits} (dB) (dB) vectors WMOPS WMOPS Remark Mixed AccEn 32.686 160.316 80.4% 6.843 5.496 A good controlled (WC still enough En16 ? CorrSq16/ from 16 ? 16 solution En32 ? CorrSq32, sections) WC is still for pre_analyze input, 16 ? 16, WC is acc. energy not increased controlled precision Mixed AccEn 32.686 130.258 59.3% n/a n/a Energy controlled information is En16 ? CorrSq16/ occasiionaly En16 ? CorrSq32 truncated, pre_analyze input, causing low acc. energy SNR controlled precision Mixed AccEn 32.686 117.634 50.6% n/a n/a Correlation controlled information En16 ? CorrSq16/ has low En32 ? CorrSq16 precision, pre_analyze input, causing low acc. energy SNR controlled precision Locked En16 ? 32.686 113.629 47.8% n/a n/a Energy CorrSq16,, information pre_analyze occasionaly input, truncated and correlation in information has low precission, causing low SNR Locked En32 ? 32.686 117.634 50.6% n/a n/a Correlation CorrSq16, information pre_analyze input has low precision, causing low SNR Locked En16 ? 40.994 159.714 78.8% n/a n/a Energy CorrSq32, information is pre_analyze input occasiionaly truncated, causing low SNR Locked En32 ? 49.697 189.773 99.8% 7.1 5.7 WC now in CorrSq32, 32 ? 32 pre_analyze input section, higher complexity WC
Abbreviations
(187) N vector dimension N.sub.S sub-vector dimension x target vector X.sub.q Quantized shape vector y.sub.final integer vector adhering to the L1-norm K K Number of final unit pulses k number of accumulated unit pulses index n coefficient or sample index i sub vector index MDCT Modified Discrete Cosine Transform PVQ Pyramid Vector Quantizer (Quantization) WC Worst Case WMOPS Weighted Million Operations Per Second AccEn Accumulated Energy ROM Read Only Memory PROM Program ROM SNR Signal-to-Noise Ratio EVS Enhanced Voice Service 3GPP 3.sup.rd Generation Partnership Project DSP Digital Signal Processor CELT Constrained Energy Lapped Transform IETF Internet Engineering Task Force MAC Multiply-Accumulate ACELP Algebraic code-excited linear prediction EPS Machine epsilon