Linear prediction coefficient conversion device and linear prediction coefficient conversion method
11222644 · 2022-01-11
Assignee
Inventors
Cpc classification
G10L19/06
PHYSICS
G10L19/12
PHYSICS
International classification
G10L19/06
PHYSICS
G10L19/12
PHYSICS
Abstract
The purpose of the present invention is to estimate, with a small amount of computation, a linear prediction synthesis filter after conversion of an internal sampling frequency. A linear prediction coefficient conversion device is a device that converts first linear prediction coefficients calculated at a first sampling frequency to second linear prediction coefficients at a second sampling frequency different from the first sampling frequency, which includes a means for calculating, on the real axis of the unit circle, a power spectrum corresponding to the second linear prediction coefficients at the second sampling frequency based on the first linear prediction coefficients or an equivalent parameter, a means for calculating, on the real axis of the unit circle, autocorrelation coefficients from the power spectrum, and a means for converting the autocorrelation coefficients to the second linear prediction coefficients at the second sampling frequency.
Claims
1. A linear prediction coefficient conversion device comprising: circuitry configured to convert first linear prediction coefficients of a linear prediction filter calculated at a first sampling frequency F1 to second linear prediction coefficients at a second sampling frequency F2 (where F1<F2) different from the first sampling frequency; calculate, on a real axis of a unit circle, a power spectrum corresponding to the second linear prediction coefficients at the second sampling frequency based on coefficient information being the first linear prediction coefficients or an equivalent parameter different from Line Spectral Pairs (LSP) coefficients, wherein the power spectrum is obtained, using LSP coefficients calculated based on the coefficient information, at points on the real axis corresponding to N1 number of different frequencies, where frequencies are 0 or more and F1 or less, and (N1−1)(F2−F1)/F1 number of power spectrum components corresponding to more than F1 and F2 or less are obtained by extrapolating the power spectrum calculated using the calculated LSP coefficients; calculate, on the real axis of the unit circle, autocorrelation coefficients from the power spectrum; convert the autocorrelation coefficients to the second linear prediction coefficients at the second sampling frequency; and encode or decode an audio signal using the linear prediction synthesis filter with the second linear prediction coefficients.
2. A linear prediction coefficient conversion method comprising: converting, by a device circuitry, first linear prediction coefficients of a linear prediction synthesis filter calculated at a first sampling frequency F1 to second linear prediction coefficients at a second sampling frequency F2 (where F1<F2) different from the first sampling frequency, comprising: calculating, on a real axis of a unit circle, a power spectrum corresponding to the second linear prediction coefficients at the second sampling frequency based on coefficient information being the first linear prediction coefficients or an equivalent parameter different from Line Spectral Pairs (LSP) coefficients, wherein the power spectrum is obtained, using LSP coefficients calculated based on the coefficient information, at points on the real axis corresponding to N1 number of different frequencies, where frequencies are 0 or more and F1 or less, and (N1−1)(F2−F1)/F1 number of power spectrum components corresponding to more than F1 and F2 or less are obtained by extrapolating the power spectrum calculated using the calculated LSP coefficients; calculating, on the real axis of the unit circle, autocorrelation coefficients from the power spectrum; converting the autocorrelation coefficients to the second linear prediction coefficients at the second sampling frequency; and encoding or decoding an audio signal using the linear prediction synthesis filter using the second linear prediction coefficients.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(7) Embodiments of a device, a method and a program are described hereinafter with reference to the drawings. Note that, in the description of the drawings, the same elements are denoted by the same reference symbols and redundant description thereof is omitted.
(8) First, definitions required to describe embodiments are described hereinafter.
(9) A response of an Nth order autoregressive linear prediction filter (which is referred to hereinafter as a linear prediction synthesis filter)
(10)
can be adapted to the power spectrum Y(ω) by calculating autocorrelation
(11)
for a known power spectrum Y(ω) at an angular frequency ω∈[−π, π] and, using the Nth order autocorrelation coefficients, solving linear prediction coefficients a.sub.1, a.sub.2, . . . , a.sub.n by the Levinson-Durbin method as a typical method, for example.
(12) Such generation of an autoregressive model using a known power spectrum can be used also for modification of a linear prediction synthesis filter 1/A(z) in the frequency domain. This is achieved by calculating the power spectrum of a known filter
Y(ω)=1/|A(ω)|.sup.2 (3)
and modifying the obtained power spectrum Y(ω) by an appropriate method that is suitable for the purpose to obtain the modified power spectrum Y′(ω)), then calculating the autocorrelation coefficients of Y′ (w) by the above equation (2), and obtaining the linear prediction coefficients of the modified filter 1/A′ (z) by the Levinson-Durbin algorithm or a similar method.
(13) While the equation (2) cannot be analytically calculated except for simple cases, the rectangle approximation can be used as follows, for example.
(14)
where Ω indicates the M number of frequencies placed at regular intervals at the angular frequency [−π, π]. When the symmetric property of Y(−ω))=−Y(ω) is used, the above-mentioned addition only needs to evaluate the angular frequency ωϵ[0, π], which corresponds to the upper half of the unit circle. Thus, it is preferred in terms of the amount of computation that the rectangle approximation represented by the above equation (4) is altered as follows
(15)
where Ω indicates the (N−2) number of frequencies placed at regular intervals at (0, π), excluding 0 and π.
(16) Hereinafter, line spectral frequencies (which are referred to hereinafter as LSF) as an equivalent means of expression of linear prediction coefficients are described hereinafter.
(17) The representation by LSF is used in various speech and audio coding techniques for the feature quantity of a linear prediction synthesis filter, and the operation and coding of a linear prediction synthesis filter. The LSF uniquely characterizes the Nth order polynomial A(z) by the n number of parameters which are different from linear prediction coefficients. The LSF has characteristics such as it easily guarantee the stability of a linear prediction synthesis filter, it is intuitively interpreted in the frequency domain, it is less likely to be affected by quantization errors than other parameters such as linear prediction coefficients and reflection coefficients, it is suitable for interpolation and the like.
(18) For the purpose of one embodiment of the present invention, LSF is defined as follows.
(19) LSF decomposition of the Nth order polynomial A(z) can be represented as follows by using displacement of an integer where κ≥0
A(z)={P(z)+Q(z)}/2 (6)
where P(z)=A(z)+z.sup.−n−κA(z.sup.−1) and
Q(z)=A(z)−z.sup.−n−κA(z.sup.1)
The equation (6) indicates that P(z) is symmetric and Q(z) is antisymmetric as follows
P(z)=z.sup.−n−κP(z.sup.−1)
Q(z)=−z.sup.−n−κQ(z.sup.−1)
Such symmetric property is an important characteristic in LSF decomposition.
(20) It is obvious that P(z) and Q(z) each have a root at z=±1. Those obvious roots are as shown in the table 1 as n and κ. Thus, polynomials representing the obvious roots of P(z) and Q(z) are defined as P.sub.T(z) and Q.sub.T(z), respectively. When P(z) does not have an obvious root, P.sub.T(z) is 1. The same applies to Q(z).
(21) LSF of A(z) is a non-trivial root of the positive phase angle of P(z) and Q(z). When the polynomial A(z) is the minimum phase, that is, when all roots of A(z) are inside the unit circle, the non-trivial roots of P(z) and Q(z) are arranged alternately on the unit circle. The number of complex roots of P(z) and Q(z) is mp and m.sub.Q, respectively. Table 1 shows the relationship of m.sub.P and m.sub.Q with the order n and displacement κ.
(22) When the complex roots of P(z), which is the positive phase angle, are represented as
ω.sub.0,ω.sub.2, . . . ,ω.sub.2m.sub.
and the roots of Q(z) are represented as
ω.sub.1,ω.sub.3, . . . ,ω.sub.2m.sub.
the positions of the roots of the polynomial A(z), which is the minimum phase, can be represented as follows.
0<ω.sub.0<ω.sub.1< . . . <ω.sub.m.sub.
(23) In speech and audio coding, displacement κ=0 or κ=1 is used. When κ=0, it is generally called immitance spectral frequency (ISF), and when κ=1, it is generally called LSF in a narrower sense than that in the description of one embodiment of the present invention. Note that, however, the representation using displacement can handle both of ISF and LSF in a unified way. In many cases, a result obtained by LSF can be applied as it is to given κ≥0 or can be generalized.
(24) When κ=0, the LSF representation only has the (m.sub.P+m.sub.Q=n−1) number of frequency parameters as shown in Table 1. Thus, one more parameter is required to uniquely represent A(z), and the n-th reflection coefficient (which is referred to hereinafter as γ.sub.n) of A(z) is typically used. This parameter is introduced into LSF decomposition as the next factor.
ν=−(γ.sub.n+1)/(γ.sub.n−1) (8)
where γ.sub.n is the n-th reflection coefficient of A(z) which begins with Q(z), and it is typically γ.sub.n=a.sub.n.
(25) When κ=1, the (m.sub.P+m.sub.Q=n) number of parameters are obtained by LSF decomposition, and it is possible to uniquely represent A(z). In this case, ν=1.
(26) TABLE-US-00001 TABLE 1 Case n κ m.sub.p M.sub.Q P.sub.r (z) Q.sub.r (z) υ (1) even 0 n/2 n/2 − 1 1 z.sup.2 − 1 −(γ.sub.n + 1)/(γ.sub.n − 1) (2) odd 0 (n − 1)/2 (n − 1)/2 z + 1 z − 1 −(γ.sub.n + 1)/(γ.sub.n − 1) (3) even 1 n/2 n/2 z + 1 z − 1 1 (4) odd 1 (n + 1)/2 (n − 1)/2 1 z.sup.2 − 1 1
(27) In consideration of the fact that non-obvious roots, excluding obvious roots, are a pair of complex numbers on the unit circle and obtain symmetric polynomials, the following equation is obtained.
(28)
(29) Likewise,
Q(z)/νQ.sub.T(z)=z.sup.−m.sup.
(30) In those polynomials,
p.sub.1,p.sub.2, . . . ,p.sub.m.sub.
and
q.sub.1,q.sub.2, . . . ,q.sub.m.sub.
completely represent P(z) and Q(z) by using given displacement κ and ν that is determined by the order n of A(z). Those coefficients can be directly obtained from the expressions (6) and (8).
(31) When z=e.sup.jω and using the following relationship
z.sup.k+z.sup.−k=e.sup.jωk+e.sup.−jωk=2 cos ωk
the expressions (9) and (10) can be represented as follows
P(ω)=2e.sup.−jωm.sup.
Q(ω)=2e.sup.−jωm.sup.
where
R(ω)=cos m.sub.Pω+p.sub.1 cos(m.sub.P−1)ω+ . . . +p.sub.m.sub.
and
S(ω)=cos m.sub.Qω+q.sub.1 cos(m.sub.Q−1)ω+ . . . +q.sub.m.sub.
(32) Specifically, LSF of the polynomial A(z) is the roots of R(ω) and S(ω) at the angular frequency ω∈(0, π).
(33) The Chebyshev polynomials of the first kind, which is used in one embodiment of the present invention, is described hereinafter.
(34) The Chebyshev polynomials of the first kind is defined as follows using a recurrence relation
T.sub.k+1(x)=2xT.sub.k(x)−T.sub.k−1(x)k=1,2, . . . (15)
(35) Note that the initial values are T.sub.0(x)=1 and T.sub.1(x)=x, respectively. For x where [−1, 1], the Chebyshev polynomials can be represented as follows
T.sub.k(x)=cos {k cos.sup.−1x}k=0,1, . . . (16)
(36) One embodiment of the present invention explains that the equation (15) provides a simple method for calculating coskω (where k=2, 3, . . . ) that begins with cosω and cos0=1. Specifically, with use of the equation (16), the equation (15) is rewritten in the following form
cos kω=2 cos ω cos(k−1)ω−cos(k−2)ωk=2,3, . . . (17)
(37) When conversion ω=arccosx is used, the first polynomials obtained from the equation (15) are as follows
(38)
(39) When the equations (13) and (14) for x∈[−1,1] are replaced by those Chebyshev polynomials, the following equations are obtained
R(χ)=T.sub.m.sub.
S(χ)=T.sub.m.sub.
(40) When LSFω.sub.i is known for i=0, 1, . . . ,m.sub.P+m.sub.Q−1, the following equations are obtained using the cosine of LSF x.sub.i=cosω.sub.i (LSP)
(41)
(42) The coefficients r.sub.0 and s.sub.0 can be obtained by comparison of the equations (18) and (19) with (20) and (21) on the basis of m.sub.P and m.sub.Q.
(43) The equations (20) and (21) are written as
R(χ)=r.sub.0χ.sup.m.sup.
S(χ)=s.sub.0χ.sup.m.sup.
(44) Those polynomials can be efficiently calculated for a given x by a method known as the Homer's method. The Homer's method obtains R(x)=b.sub.0(x) by use of the following recursive relation
b.sub.k(x)=xb.sub.k+1(x)+r.sub.k
where the initial value is
b.sub.m.sub.
The same applies to S(x).
(45) A method of calculating the coefficients of the polynomials of the equations (22) and (23) is described hereinafter using an example. It is assumed in this example that the order of A(z) is 16 (n=16). Accordingly, m.sub.P=m.sub.Q=8 in this case. Series expansion of the equation (18) can be represented in the form of the equation (22) by substitution and simplification by the Chebyshev polynomials. As a result, the coefficients of the polynomial of the equation (22) are represented as follows using the coefficient p.sub.i of the polynomial P(z).
(46)
(47) The coefficients of P(z) can be obtained from the equation (6). This example can be applied also to the polynomial of the equation (23) by using the same equation and using the coefficients of Q(z). Further, the same equation for calculating the coefficients of R(x) and S(x) can easily derive another order n and displacement κ as well.
(48) Further, when the roots of the equations (20) and (21) are known, coefficients can be obtained from the equations (20) and (21).
(49) The outline of processing according to one embodiment of the present invention is described hereinafter.
(50) One embodiment of the present invention provides an effective calculation method and device for, when converting a linear prediction synthesis filter calculated in advance by an encoder or a decoder at a first sampling frequency to the one at a second sampling frequency, calculating the power spectrum of the linear prediction synthesis filter and modifying it to the second sampling frequency, and then obtaining autocorrelation coefficients from the modified power spectrum.
(51) A calculation method for the power spectrum of a linear prediction synthesis filter according to one embodiment of the present invention is described hereinafter. The calculation of the power spectrum uses the LSF decomposition of the equation (6) and the properties of the polynomials P(z) and Q(z). By using the LSF decomposition and the above-described Chebyshev polynomials, the power spectrum can be converted to the real axis of the unit circle.
(52) With the conversion to the real axis, it is possible to achieve an effective method for calculating a power spectrum at an arbitrary frequency in ω∈[0, π]. This is because it is possible to eliminate transcendental functions since the power spectrum is represented by polynomials. Particularly, it is possible to simplify the calculation of the power spectrum at ω=0, ω=π/2 and ω=π. The same simplification is applicable also to LSF where either one of P(z) or Q(z) is zero. Such properties are advantageous compared with FFT, which is generally used for the calculation of the power spectrum.
(53) It is known that the power spectrum of A(z) can be represented as follows using LSF decomposition.
|A(ω)|.sup.2={|P(ω)|.sup.2+|Q(ω)|.sup.2}/4 (26)
(54) One embodiment of the present invention uses the Chebyshev polynomials as a way to more effectively calculate the power spectrum|A(ω)|.sup.2 of A(z) compared with the case of directly applying the equation (26). Specifically, the power spectrum|A(ω)|.sup.2 is calculated on the real axis of the unit circle as represented by the following equation, by converting a variable to x=cosω and using LSF decomposition by the Chebyshev polynomials.
(55)
(1) to (4) correspond to (1) to (4) in Table 1, respectively.
(56) The equation (27) is proven as follows.
(57) The following equations are obtained from the equations (11) and (12).
|P(ω)|.sup.2=4|R(ω)|.sup.2|P.sub.T(ω)|.sup.2
|Q(ω)|.sup.2=4ν.sup.2|S(ω)|.sup.2|Q.sub.T(ω)|.sup.2
(58) The factors that represent the obvious roots of P(ω) and Q(ω) are respectively as follows.
(59)
(60) Application of the substitution cosω=x and cos2ω=2x.sup.2−1 to |P.sub.T(ω)| and |Q.sub.T(ω)|, respectively, gives the equation (27).
(61) The polynomials R(x) and S(x) may be calculated by the above-described Homer's method. Further, when x to calculate R(x) and S(x) is known, the calculation of a trigonometric function can be omitted by storing x in a memory.
(62) The calculation of the power spectrum of A(z) can be further simplified. First, in the case of calculating with LSF, one of R(x) and S(x) in the corresponding equation (27) is zero. When the displacement is κ=1 and the order n is an even number, the equation (27) is simplified as follows.
(63)
Further, in the case of ω={0,π/2,π}, it is simplified when x={1,0,−}. The equations are as follows when the displacement is x=1 and the order n is an even number, which are the same as in the above example.
|A(ω=0)|.sup.2=4R.sup.2(1)
|A(ω=2)|.sup.2=2(R.sup.2(0)+S.sup.2(0))
|A(ω=π)|.sup.2=4S.sup.2(−1)
(64) The similar results can be easily obtained also when the displacement is κ=0 and the order n is an odd number.
(65) The calculation of autocorrelation coefficients according to one embodiment of the present invention is described below.
(66) In the equation (5), when a frequency Ω.sub.+=Δ,2Δ, . . . , (N−1)Δ where N is an odd number and the interval of frequencies is Δ=π/(N−1) is defined, the calculation of autocorrelation contains the above-described simplified power spectrum at ω=0,π/2,π. Because the normalization of autocorrelation coefficients by 1/N does not affect linear prediction coefficients to be obtained as a result, any positive value can be used.
(67) Still, however, the calculation of the equation (5) requires coskω where k=1,2, . . . ,n for each of the (N−2) number of frequencies. Thus, the symmetric property of coskω is used.
cos (π−kω)=(−1).sup.k cos kω, ω∈(0, π/2) (28)
(68) The following characteristics are also used.
cos(kπ/2)=(1/2)(1+(−1).sup.k+1)(−1.sup.└k/2┘
where └χ┘ indicates the largest integer that does not exceed x. Note that the equation (29) is simplified to 2,0,−2,0,2,0, . . . for k=0,1,2, . . . .
(69) Further, by conversion to x=cosω, the autocorrelation coefficients are moved onto the real axis of the unit circle. For this purpose, the variable X(x)=Y(arccos x) is introduced. This enables the calculation of coskω by use of the equation (15).
(70) Given the above, the autocorrelation approximation of the equation (5) can be replaced by the following equation.
(71)
where T.sub.k(X)=2xT.sub.k−1(x)−T.sub.k−2(x)
k=2, 3, . . . ,n, and T.sub.0(x)=1, T.sub.1(x)=cosx as described above. When the symmetric property of the equation (28) is taken into consideration, the last term of the equation (30) needs to be calculated only when x∈Λ={cos Δ, cos 2Δ, . . . , (N−3)Δ/2}, and the (N−3)/2 number of cosine values can be stored in a memory.
(72) An example of the present invention is described hereinafter. In this example, a case of converting a linear prediction synthesis filter calculated at a first sampling frequency of 16,000 Hz to that at a second sampling frequency of 12,800 Hz (which is referred to hereinafter as conversion 1) and a case of converting a linear prediction synthesis filter calculated at a first sampling frequency of 12,800 Hz to that at a second sampling frequency of 16,000 Hz (hereinafter as conversion 2) are used. Those two sampling frequencies have a ratio of 4:5 and are generally used in speech and audio coding. Each of the conversion 1 and the conversion 2 of this example is performed on the linear prediction synthesis filter in the previous frame when the internal sampling frequency has changed, and it can be performed in any of an encoder and a decoder. Such conversion is required for setting the correct internal state to the linear prediction synthesis filter in the current frame and for performing interpolation of the linear prediction synthesis filter in accordance with time.
(73) Processing in this example is described hereinafter with reference to the flowcharts of
(74) To calculate a power spectrum and autocorrelation coefficients by using a common frequency point in both cases of the conversions 1 and 2, the number of frequencies when a sampling frequency is 12,800 Hz is determined as N.sub.L=1+(12,800 Hz/16,000 Hz)(N−1). Note that N is the number of frequencies at a sampling frequency of 16,000 Hz. As described earlier, it is preferred that N and N.sub.L are both odd numbers in order to contain frequencies at which the calculation of a power spectrum and autocorrelation coefficients is simplified. For example, when N is 31, 41, 51, 61, the corresponding N.sub.L is 25, 33, 41, 49. The case where N=31 and N.sub.L=25 is described as an example below (Step S000).
(75) When the number of frequencies to be used for the calculation of a power spectrum and autocorrelation coefficients in the domain where the sampling frequency is 16,000 Hz is N=31, the interval of frequencies is Δ=π/30, and the number of elements required for the calculation of autocorrelation contained in Λ is (N−3)/2=14.
(76) The conversion 1 that is performed in an encoder and a decoder under the above conditions is carried out in the following procedure.
(77) Determine the coefficients of polynomials R(x) and S(x) by using the equations (20) and (21) from roots obtained by displacement κ=0 or κ=1 and LSF which correspond to a linear prediction synthesis filter obtained at a sampling frequency of 16,000 Hz, which is the first sampling frequency (Step S001).
(78) Calculate the power spectrum of the linear prediction synthesis filter at the second sampling frequency up to 6,400 Hz, which is the Nyquist frequency of the second sampling frequency. Because this cutoff frequency corresponds to ω=(4/5)π at the first sampling frequency, a power spectrum is calculated using the equation (27) at N.sub.L=25 number of frequencies on the low side. For the calculation of R(x) and S(x), the Homer's method may be used to reduce the calculation. There is no need to calculate a power spectrum for the remaining 6 (=N−N.sub.L) frequencies on the high side (Step S002).
(79) Calculate autocorrelation coefficients corresponding to the power spectrum obtained in Step S002 by using the equation (30). In this step, N in the equation (30) is set to N.sub.L=25, which is the number of frequencies at the second sampling frequency (Step S003).
(80) Derive linear prediction coefficients by the Levinson-Durbin method or a similar method with use of the autocorrelation coefficient obtained in Step S003, and obtain a linear prediction synthesis filter at the second sampling frequency (Step S004).
(81) Convert the linear prediction coefficient obtained in Step S004 to LSF (Step S005).
(82) The conversion 2 that is performed in an encoder or a decoder can be achieved in the following procedure, in the same manner as the conversion 1.
(83) Determine the coefficients of polynomials R(x) and S(x) by using the equations (20) and (21) from roots obtained by displacement κ=0 or κ=1 and LSF which correspond to a linear prediction synthesis filter obtained at a sampling frequency of 12,800 Hz, which is the first sampling frequency (Step S011).
(84) Calculate the power spectrum of the linear prediction synthesis filter at the second sampling frequency up to 6,400 Hz, which is the Nyquist frequency of the first sampling frequency, first. This cutoff frequency corresponds to ω=π, and a power spectrum is calculated using the equation (27) at N.sub.L=25 number of frequencies. For the calculation of R(x) and S(x), the Homer's method may be used to reduce the calculation. For 6 frequencies exceeding 6,400 Hz at the second sampling frequency, a power spectrum is extrapolated. As an example of extrapolation, the power spectrum obtained at the N.sub.L-th frequency may be used (Step S012).
(85) Calculate autocorrelation coefficients corresponding to the power spectrum obtained in Step S012 by using the equation (30). In this step, N in the equation (30) is set to N=31, which is the number of frequencies at the second sampling frequency (Step S013).
(86) Derive linear prediction coefficients by the Levinson-Durbin method or a similar method with use of the autocorrelation coefficient obtained in Step S013, and obtain a linear prediction synthesis filter at the second sampling frequency (Step S014).
(87) Convert the linear prediction coefficient obtained in Step S014 to LSF (Step S015).
(88)
Alternative Example
(89) Although the coefficients of the polynomials R(x) and S(x) are calculated using the equations (20) and (21) in Steps S001 and S011 of the above-described example, the calculation may be performed using the coefficients of the polynomials of the equations (9) and (10), which can be obtained from the linear prediction coefficients. Further, the linear prediction coefficients may be converted from LSP coefficients or ISP coefficients.
(90) Furthermore, in the case where a power spectrum at the first sampling frequency or the second sampling frequency is known by some method, the power spectrum may be converted to that at the second sampling frequency, and Steps S001, S002, S011 and S012 may be omitted.
(91) In addition, in order to assign weights in the frequency domain, a power spectrum may be deformed, and linear prediction coefficients at the second sampling frequency may be obtained.