Software digital front end (SoftDFE) signal processing
09778902 ยท 2017-10-03
Assignee
Inventors
- Kameran Azadet (Pasadena, CA)
- Chengzhou Li (Whitehall, PA, US)
- Albert MOLINA (Novelda, ES)
- Joseph H. Othmer (Ocean, NJ, US)
- Steven C. Pinault (Allentown, PA, US)
- Meng-Lin Yu (Morganville, NJ)
- Joseph Williams (Holmdel, NJ)
- Ramon Sanchez Perez (Galapagar, ES)
- Jian-Guo Chen (Basking Ridge, NJ)
Cpc classification
H03F2201/3224
ELECTRICITY
H03F2201/3233
ELECTRICITY
H04L25/03
ELECTRICITY
H03F3/189
ELECTRICITY
H04B1/0003
ELECTRICITY
G06F9/3895
PHYSICS
H04L25/03178
ELECTRICITY
H03F2200/336
ELECTRICITY
H03F1/0288
ELECTRICITY
H04B1/62
ELECTRICITY
H04L25/02
ELECTRICITY
G06F17/15
PHYSICS
G06F9/30036
PHYSICS
H04L1/0054
ELECTRICITY
H03F2201/3212
ELECTRICITY
H03F2201/3209
ELECTRICITY
International classification
H04L1/00
ELECTRICITY
H04B1/62
ELECTRICITY
G06F9/30
PHYSICS
G06F17/15
PHYSICS
H04L25/02
ELECTRICITY
H04L25/49
ELECTRICITY
H04L25/03
ELECTRICITY
H03M3/00
ELECTRICITY
H03F1/02
ELECTRICITY
H03F1/32
ELECTRICITY
Abstract
Software Digital Front End (SoftDFE) signal processing techniques are provided. One or more digital front end (DFE) functions are performed on a signal in software by executing one or more specialized instructions on a processor to perform the one or more digital front end (DFE) functions on the signal, wherein the processor has an instruction set comprised of one or more of linear and non-linear instructions. A block of samples comprised of a plurality of data samples is optionally formed and the digital front end (DFE) functions are performed on the block of samples. The specialized instructions can include a vector convolution function, a complex exponential function, an x.sup.k function, a vector compare instruction, a vector max( ) instruction, a vector multiplication instruction, a vector addition instruction, a vector sqrt( ) instruction, a vector 1/x instruction, and a user-defined non-linear instruction.
Claims
1. A method for performing a vector convolution function on a signal in software, comprising: receiving, by a processor, the signal, wherein the signal comprises a plurality of data samples, and performing, by the processor, the vector convolution function on the signal and a plurality of coefficients by executing, in response to a single software instruction, a single instruction of a hardware instruction set of the processor, wherein the single instruction comprises a vector convolution instruction, wherein performing the vector convolution function comprises producing, for each of a plurality of time shifts, a finite impulse response output value based on the plurality of data samples and the plurality of coefficients, wherein producing, for each of the plurality of time shifts, the finite impulse response output value comprises to produce, for each of the plurality of time shifts, only a portion of the finite impulse response output value based on the plurality of data samples and only a portion of each coefficient of the plurality of coefficients in one clock cycle of the processor.
2. The method of claim 1, wherein said processor comprises one or more of a digital signal processor or a vector processor.
3. The method of claim 1, wherein a plurality of signals from a plurality of processors are each processed on a separate processor.
4. The method of claim 1, further comprising performing, by the processor, a complex exponential function on the signal by executing, in response to a single complex exponentiation software instruction, a single complex exponentiation instruction of the hardware instruction set.
5. The method of claim 1 further comprising performing, by the processor, an x.sup.k function on the signal by executing, in response to a single x.sup.k software instruction, a single x.sup.k instruction of the hardware instruction set.
6. The method of claim 1, further comprising performing, by the processor, a digital up conversion function that multiplies said signal by a complex exponential by executing, in response to a single up conversion software instruction, a single up conversion instruction of the hardware instruction set.
7. The method of claim 1 further comprising performing, by the processor, one or more user-defined non-linear instructions.
8. The method of claim 7, wherein said one or more user-defined non-linear instructions comprise at least one user-specified parameter.
9. The method of claim 8, wherein performing, by the processor, the one or more user defined non-linear instructions comprises: invoking at least one functional unit that applies said non-linear function to an input value, x; and generating an output corresponding to said non-linear function for said input value, x.
10. The method of claim 8, further comprising the step of loading said at least one user-specified parameter from memory into at least one register.
11. The method of claim 8, wherein said user-specified parameter comprises a look-up table storing values of said non-linear function for a finite number of input values.
12. The method of claim 1, wherein only the portion of each coefficient of the plurality of coefficients comprises only the first two bits of each coefficient of the plurality of coefficients, wherein each coefficient of the plurality of coefficients comprises more than two bits.
13. A processor for performing a vector convolution function on a signal in software, comprising: a memory; and at least one hardware device, coupled to the memory, operative to: receive, by the at least one hardware device, the signal, wherein the signal comprises a plurality of data samples, and perform, by the at least one hardware device, the vector convolution function on the signal and a plurality of coefficients by executing, in response to a single software instruction, a single instruction of a hardware instruction set of the processor, wherein the single instruction comprises a vector convolution instruction, wherein to perform the vector convolution function comprises to produce, for each of a plurality of time shifts, a finite impulse response output value based on the plurality of data samples and the plurality of coefficients, wherein to produce, for each of the plurality of time shifts, the finite impulse response output value comprises to produce, for each of the plurality of time shifts, only a portion of the finite impulse response output value based on the plurality of data samples and only a portion of each coefficient of the plurality of coefficients in one clock cycle of the at least one hardware device.
14. The processor of claim 13, wherein only the portion of each coefficient of the plurality of coefficients comprises only the first two bits of each coefficient of the plurality of coefficients, wherein each coefficient of the plurality of coefficients comprises more than two bits.
15. The processor of claim 13, wherein said processor comprises one or more of a digital signal processor or a vector processor.
16. The processor of claim 13, wherein a plurality of signals from a plurality of processors are each processed on a separate processor.
17. The processor of claim 13, wherein the hardware device is further to perform a complex exponential function on the signal by executing, in response to a single complex exponentiation software instruction, a single complex exponentiation instruction of the hardware instruction set of the processor.
18. The processor of claim 13, wherein the hardware device is further to perform an x.sup.k function on the signal by executing, in response to a single x.sup.k software instruction, a single x.sup.k instruction of the hardware instruction set of the processor.
19. The processor of claim 13, wherein the hardware device is further to perform a digital up conversion function that multiplies said signal by a complex exponential signal by executing, in response to a single up conversion software instruction, a single up conversion instruction of the hardware instruction set of the processor.
20. The processor of claim 13, wherein the hardware device is further to perform one or more user-defined non-linear instructions.
21. The processor of claim 20, wherein the one or more user-defined non-linear instructions comprises at least one user-specified parameter.
22. The processor of claim 21, wherein to perform the one or more user defined non-linear instructions comprises to: invoke at least one functional unit that implements applies the non-linear function to an input value x; and generate an output corresponding to the non-linear function for the input value x.
23. The processor of claim 21, wherein the hardware device is further to load the at least one user-specified parameter into at least one register.
24. The processor of claim 21, wherein the at least one user-specified parameter comprises a look-up table storing values of said non-linear function for a finite number of input values.
25. One or more non-transitory computer readable media comprising a plurality of instructions stored thereon that, when executed by at least one hardware device, causes the at least one hardware device to: receive a signal, wherein the signal comprises a plurality of data samples; perform a vector convolution function on the signal and a plurality of coefficients by executing, in response to a single software instruction, a single instruction of a hardware instruction set of the at least one hardware device, wherein the single instruction comprises a vector convolution instruction, wherein to perform the vector convolution function comprises to produce, for each of a plurality of time shifts, a finite impulse response output value based on the plurality of data samples and the plurality of coefficients, wherein to produce, for each of the plurality of time shifts, the finite impulse response output value comprises to produce, for each of the plurality of time shifts, only a portion of the finite impulse response output value based on the plurality of data samples and only a portion of each coefficient of the plurality of coefficients in one clock cycle of the at least one hardware device.
26. The one or more non-transitory computer-readable storage media of claim 25, wherein only the portion of each coefficient of the plurality of coefficients comprises only the first two bits of each coefficient of the plurality of coefficients, wherein each coefficient of the plurality of coefficients comprises more than two bits.
27. The one or more non-transitory computer-readable storage media of claim 25, wherein the plurality of instructions further causes the hardware device to perform a complex exponential function on the signal by executing, in response to a single complex exponentiation software instruction, a single complex exponentiation instruction of the hardware instruction set.
28. The one or more non-transitory computer-readable storage media of claim 25, wherein the plurality of instructions further causes the hardware device to perform an x.sup.k function on the signal by executing, in response to a single x.sup.k software instruction, a single x.sup.k instruction of the hardware instruction set.
29. The one or more non-transitory computer-readable storage media of claim 25, wherein the plurality of instructions further causes the hardware device to perform a digital up conversion function that multiplies said signal by a complex exponential signal by executing, in response to a single up conversion software instruction, a single up conversion instruction of the hardware instruction set.
30. The one or more non-transitory computer-readable storage media of claim 25, wherein the plurality of instructions further causes the hardware device to perform one or more user-defined non-linear instructions.
31. The one or more non-transitory computer-readable storage media of claim 30, wherein the one or more user-defined non-linear instructions comprises at least one user-specified parameter.
32. The one or more non-transitory computer-readable storage media of claim 31, wherein to perform the one or more user defined non-linear instructions comprises to: invoke at least one functional unit that implements applies the non-linear function to an input value x; and generate an output corresponding to the non-linear function for the input value x.
33. The one or more non-transitory computer-readable storage media of claim 31, wherein the plurality of instructions further causes the hardware device to load the at least one user-specified parameter into at least one register.
34. The one or more non-transitory computer-readable storage media of claim 31, wherein the at least one user-specified parameter comprises a look-up table storing values of said non-linear function for a finite number of input values.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
DETAILED DESCRIPTION
(24)
(25) As shown in
(26) As shown in
(27) According to one aspect of the invention, one or more of the blocks of the digital front end (DFE) of the communication system 100 of
(28)
(29) The data blocks 250 are optionally stored in a buffer. In one exemplary implementation, two data blocks 250 can be stored in the buffer at a time. Thus, the buffer has a size of at least two block lengths.
(30) Channel Filter and Digital Up Conversion Stage 110
(31) As indicated above, the channel filter and digital up conversion stage 110 performs channel filtering using, for example, finite impulse response (FIR) filters and digital up conversion to convert a digitized baseband signal to a radio frequency (RF). As discussed hereinafter, one or more functions of the channel filter and digital up conversion stage 110 are implemented in software on one or more vector processors accelerated using either vector multiplication, vector addition and reduction, or alternatively, a vector convolution instruction. Digital up conversion, for example, requires multiplying the input signal by a complex exponential (vector multiplication, i.e. component wise product of two vectors, the signal and rotator vector) and an aspect of the present invention employs an accelerated complex exponential function. Digital modulation is optionally performed using a numerically controlled oscillator (NCO) based on the complex exponential (computed as a vector).
(32)
(33) The output of the interpolation filters 330 is applied to a multiplier 340 and multiplied by a complex exponential function exp(j.sub.0n) For a more detailed discussion of the complex exponential function exp(j.sub.0n), see, International Patent Application Serial No. PCT/US12/62191, entitled Digital Processor Having Instruction Set With Complex Exponential Non-Linear Function, filed contemporaneously herewith and incorporated by reference herein.
(34) The various channels are then aggregated and applied to the CFR 120 of
(35)
(36) The following table describes an exemplary implementation of the filters 410, 420, 430 of
(37) TABLE-US-00001 Input Output Filter Rate Rate Number Bit Comment Stage MHz MHz Of Taps Width Filter Type Stage 1 3.84 7.68 125 14 Root Raised Cosine (410) Stage 2 7.68 15.36 31 18 Half-Band (420) Stage 3 15.36 30.72 15 16 Half-Band (430)
(38) The present invention recognizes that the filtering operations described herein, including the filtering operations of filters 410, 420, 430 can be accelerated using a vector convolution function, discussed further below in conjunction with
(39)
(40) The following table describes an exemplary implementation of the filters 450, 460 of
(41) TABLE-US-00002 Input Output Filter Rate Rate Number Of Bit Stage (MHz) (MHz) Taps Width Filter Type Stage 1 30.72 61.44 29 16 HBF (450) Stage 2 61.44 307.42 45 16 M-th band Nyquist (460) 5 Poly-phase Filter. Being M-th banks band (M = 5), every with each 5.sup.th sample is zero. bank This implies only 4 consisting of poly-phase banks 9 taps need to be processed and the 5.sup.th poly-phase bank is just an impulse tap in the middle.
(42)
(43) Crest Factor Reduction Stage 120
(44) As indicated above, the crest factor reduction stage 120 limits the PAR of the transmitted signal. As discussed hereinafter, the crest factor reduction requires peak detection and peak cancellation. The peak detection can leverage a vector compare instruction or a specialized max( )instruction. Likewise, peak cancellation involves multiplication and additions of vectors, and hard clipping involves envelope computation (vector sqrt( )and vector (x*conj(x)) comparing to a threshold and scaling (1/x for a vector component wise) which can be accelerated using a vector processor. The sqrt( )and 1/x operations can additionally be combined and performed using a vector x.sup.0.5 operation/instruction.
(45)
(46) The exemplary Crest Factor Reduction algorithm 600 can optionally be performed iteratively to address peak regrowth. For example, a number of iterations, N_iter, can have a typical value between 1 and 4. Generally, peak regrowth results when new peaks are introduced when canceling other peaks, due to the ringing on both sides of the pulse (the pulse is traditionally designed as a linear phase symmetrical FIR filter with a plurality of taps). There are taps on both sides of the center tap. Thus, peaks can be introduced in current or past sample values. In order to address the peaks introduced in past samples, existing CFR algorithms requires multiple iterations to cancel all peaks.
(47) During the peak search phase 610, a search is conducted through the signal to determine the number of peaks, their locations and the magnitudes above the threshold level. The exemplary Crest Factor Reduction algorithm 600 initially computes the antenna samples magnitude. The sample values above a threshold are then identified. For example, the threshold can be established based on the PAR target. Thereafter, the peak positions can be identified, for example, using a vector max( )instruction. The peak detection can optionally leverage a vector compare instruction or a specialized vector max( )instruction.
(48) During the pulse cancellation phase 640, the cancellation pulses are arranged at each of the peaks, then all of the pulses are subtracted from the peaks. The exemplary Crest Factor Reduction algorithm 600 computes the pulse cancellation gains (e.g., threshold divided by magnitude of the detected peaks). Thereafter, the exemplary Crest Factor Reduction algorithm 600 enters a loop to separately process each peak. For each peak, a pulse is generated, for example, using a vector multiplication instruction, and then the pulse is cancelled from the antenna, for example, using a vector addition instruction. Peak cancellation involves multiplication and additions of vectors, which can be accelerated on a vector processor.
(49) During the hard clipping phase 680, the exemplary Crest Factor Reduction algorithm 600 hard clips the output waveform, for example, using non-linear operations for magnitude inverse. The clipping threshold level R is set based on the PAR target. The hard clipping may be performed, for example, using a polar clipping technique. Generally, polar clipping involves computing |x|, comparing |x| to a threshold R and scaling by R/|x|. If |x| greater than R, x is replaced by R. Again 1/|x| can be efficiently computed on a vector processor using a vector x.sup.0.5 operation/instruction.
(50) In a further variation, crest factor reduction can be performed in the frequency domain.
(51) As indicated above, one aspect of the present invention recognizes that CFR processing can be performed on blocks of data to improve efficiency. For example, a vector engine (VE) can be employed to perform CFR on blocks of data. For example, in a software implementation, block processing allows latency to be maintained constant, independent of processor load. In addition, in a software implementation, block processing improves efficiency by amortizing the overhead over an entire block of data and not just individual data samples 310.
(52)
(53) Thus, according to another aspect of the invention, continuity of processing between blocks of data is ensured using one or more pre-cursor and/or post-cursor block samples.
(54) In one exemplary embodiment, the size of each cursor block 810. 860 is selected to be approximately equal to the size of half of a cancellation pulse 710, 720. In addition, to maintain an appropriate amount of overhead, the size of each data block 850 should be considerably larger than the size of each cursor block 810, 860. Generally, the larger the size of each data block 850, the larger the required memory and the higher the latency.
(55) The pre-cursor blocks 810 are populated with input data from the end of the prior data block and the post-cursor block 860 is populated with input data from the beginning of subsequent data block.
(56) In one exemplary embodiment, peaks are detected and canceled in the block 850 and in the first pre-cursor block 810-1, and not in the post-cursor block 860 because post-cursor data will be processed during the processing of the next block. The post-cursor input samples associated with the post-cursor block 860 are only need to cancel peaks inside the block 850.
(57) In addition, when canceling a peak at the left edge of the block 850, peak re-growth occurs in the first pre-cursor block 810-1. Thus, in order to cancel these new peaks in the first pre-cursor block 510-1, the second pre-cursor block 810-2 is needed (but no cancellation is performed in the second pre-cursor block 810-2.
(58)
(59) Hard clipping involves envelope computation (vector sqrt( )and vector (x*conj(x)) comparing to a threshold and scaling (1/x for a vector component wise) which can be accelerated using a vector processor. These complex multiplications can be accelerated using vector multipliers as well as a vector square root operation.
(60) In addition, aspects of the present invention recognize that 1/|x| can directly be computed using (x*conj(x)).sup.0.5, which can be accelerated using a specialized vector x.sup.k (vec_x_pow_k) instruction.
(61)
(62) The input to the vector-based digital signal processor 1000 is a vector, x, comprised of a plurality of scalar numbers, x.sub.n, that are processed in parallel. For example, assume a vector-based digital signal processor 1000 supports an x.sup.K function for a vector, x, where X is comprised of scalar numbers x.sub.1 through x.sub.4. The exemplary x.sup.K function may be expressed as follows:
Pow_vec4(x.sub.1, x.sub.2, x.sub.3, x.sub.4, K).
(63) See also U.S. patent application Ser. No. 12/362,874, filed Jan. 30, 2009, entitled Digital Signal Processor Having Instruction Set with an x.sup.k Function Using Reduced Look-Up Table, incorporated by reference herein.
(64) The exemplary vector-based digital processor 1000 can be implemented as a 16-way vector processor to compute 32 x.sup.K operations using a pow(x, K) instruction implemented as:
(65) vec_pow(x.sub.1, x.sub.2, . . . , x.sub.32, K), where K values are for example 0.5, 0.5, 1.
(66) In this manner, the vector-based digital processor 1000 can perform 16 such operations and combine them in a single cycle.
(67) Digital Pre-Distortion Stage 130
(68) As indicated above, the digital pre-distortion stage 130 linearizes the power amplifier to improve efficiency. As discussed hereinafter, digital pre-distortion involves computing non-linear functions for a vector. The non-linear functions could be a polynomial or another basis function. This can be accelerated using non-linear instructions that combine a look-up table and Taylor series.
(69) The digital pre-distortion stage 130 of
(70)
(71) The output of the digital pre-distorter 1130 is applied in parallel to two digital to analog converters (DACs) 1140-1, 1140-2, and the analog signals are then processed by a quadrature modulation stage 1150 that further up converts the signals to an RF signal.
(72) The output 1155 of the quadrature modulation stage 1150 is applied to a power amplifier 1160, such as a Doherty amplifier or a drain modulator. As indicated above, the digital pre-distorter 1130 linearizes the power amplifier 1160 to improve the efficiency of the power amplifier 1160.
(73) In a feedback path 1165, the output of the power amplifier 1160 is applied to an attenuator 1170 before being applied to a demodulation stage 1180 that down converts the signal to baseband. The down converted signal is applied to an analog to digital converter (ADC) 1190 to digitize the signal. The digitized samples are then processed by a complex adaptive algorithm 1195 that generates parameters w for the digital pre-distorter 1130. The complex adaptive algorithm 1195 is outside the scope of the present application. Known techniques such as least squares (LS) or recursive least squares (RLS) can be employed to generate the parameters for the digital pre-distorter 1130.
(74) Non-Linear Filter Implementation of Digital Pre-Distorter
(75) A digital pre-distorter 1130 can be implemented as a non-linear filter using a Volterra series model of non-linear systems. The Volterra series is a model for non-linear behavior in a similar manner to a Taylor series. The Volterra series differs from the Taylor series in its ability to capture memory effects. The Taylor series can be used to approximate the response of a non-linear system to a given input if the output of this system depends strictly on the input at that particular time (static non-linearity). In the Volterra series, the output of the non-linear system depends on the input to the system at other times. Thus, the Volterra series allows the memory effect of devices to be captured.
(76) Generally, a causal linear system with memory can be expressed as:
y(t)=.sub..sup.h()x(t)d
(77) In addition, a static weakly non-linear system without memory can be modeled using a polynomial expression:
y(t)=.sub.k=1.sup.a.sub.k[x(t)].sup.k
(78) The Volterra series can be considered as a combination of the two:
y(t)=.sub.k=1.sup.Ky.sub.k(t)
y.sub.k(t)=.sub..sup. . . . .sub..sup.h.sub.k(.sub.1, . . . , .sub.k)x(t.sub.1) . . . x(t.sub.k)d.sub.1 . . . d.sub.k
(79) In the discrete domain, the Volterra Series can be expressed as follows:
y(n)=.sub.k=1.sup.Ky.sub.k(n)
y.sub.k(n)=.sub.m.sub.
(80) The complexity of a Volterra series can grow exponentially making its use impractical in many common applications, such as DPD. Thus, a number of simplified models for non-linear systems have been proposed. For example, a memory polynomial is a commonly used model:
(81)
(82) Another simplified model referred to as a Generalized Memory Polynomial Model, can be expressed as follows (where M indicates the memory depth and K indicates the polynomial order):
(83)
(84) An equivalent expression of the Generalized Memory Polynomial with cross-products. can be expressed as follows:
(85)
(86) where:
(87)
where f(x) is a non-linear function having one or more user-specified parameters assumed to be accelerated in accordance with an aspect of the invention using the user-defined non-linear instruction vec_nl, discussed below. It is noted that other basis functions other than x.sup.k for non-linear decomposition are possible.
(88) As discussed hereinafter, the user-defined non-linear instruction .sub.m,l can be processed, for example, by a vector processor. The .sub.m,l is an ml array of non-linear functions. Each non-linear function can have a user-specified parameter, such as a look-up table or coefficients. The look-up table can be a polynomial approximation of the user-defined non-linear instruction .sub.m,l. As discussed further below in conjunction with
(89)
(90)
(91)
(92) The exemplary functional block diagram 1250 also comprises a plurality of multipliers (x) 1275 that receive the appropriate x(nm) term and multiply it with the output of the summed output of a column of corresponding m,l functional units 1270. In this manner, the non-linear gains from adders 1280 are applied to the input data (complex multiply-accumulate (CMAC) operations). The outputs of the multiplication added by adders (+) 1285 to generate the output y(n).
(93)
(94) As indicated above, if a desired x value is not in the look-up table but rather is in between 2 values in the look-up table, then a linear interpolation is performed in hardware within the functional unit to obtain the result. A Taylor series computation can be performed as a cubic interpolation to evaluate the small cubic polynomial, as follows:
()=a.sub.0+a.sub.1.Math.+a.sub.2.Math..sup.2+a.sub.3.Math..sup.3
where the coefficients a are obtained from the look-up table. The complexity of this expression, however, is significant (with a number of multipliers to perform the multiplications and squaring operations).
(95) The complexity can be reduced using the Horner algorithm (factorization), such that () can be computed as follows. See, also, U.S. patent application Ser. No. 12/324,934, filed Nov. 28, 2008, entitled Digital Signal Processor With One Or More Non-Linear Functions Using Factorized Polynomial Interpolation, incorporated by reference herein.
()=((b.sub.3.Math.+b.sub.2).Math.+b.sub.1).Math.+b.sub.0 (3)
The complexity in equation (3) has been reduced to only 3 multiplication and 3 addition operations. () is an offset from the value stored in the look-up table.
(96)
(97)
(98) Generally, the vector-based digital processor 1500 processes a vector of inputs x and generates a vector of outputs, y(n). The exemplary vector-based digital processor 1500 is shown for a 16-way vector processor nl instruction implemented as:
(99) vec_nl (x1,x2, . . . , x16), range of x[k] from 0 to 1
(100) In this manner, the vector-based digital processor 1500 can perform 16 such non-linear operations and linearly combine them in a single cycle. For example, the user-defined non-linear function can be expressed as:
(101)
(102) It is noted that in the more general case, different functions f.sub.0( ), f.sub.1( ), . . . , f.sub.15( ) may be applied to each component of the vector data of the vector processor.
(103) As shown in
(104) DPD Parameter Estimation 160
(105) As indicated above, the digital signal from the analog-to-digital converter (ADC) is stored in an on-chip memory 170 for DPD parameter estimation 160. As discussed hereinafter, DPD parameter estimation involves computing matrices containing non-linear terms such as x.|y|.sup.k. Envelope operations involve vector operations of the type x*conj(x) and vector sqrt( )which can be accelerated using a vector processor. Multiplication of matrices can use vector multiplication, addition and reduction. Convolution can be accelerated using a vector convolution instruction
(106)
(107) Thereafter, the coefficients w of the inverse model generated by the estimation algorithm 1650 are copied to pre-distorter 1610 to pre-distort the input to the amplifier 1620.
(108)
(109) Thereafter, the coefficients w of the inverse model generated by the estimation algorithm 1750 are provided to pre-distorter 1710 to pre-distort the input to the amplifier 1720.
(110) The DFE output can be expressed as z(n) and the observation signal PA feedback receiver input can be expressed as y(n). The inverse model of the power amplifier 1620, 1720 is desired. Correlations are needed for all, r, p and q:
(111)
where h .sub.k,m,l, are the desired coefficients for the inverse model of the power amplifier 1620, 1720.
(112)
(113) So the following must also be computed:
B(k,r,l,m,p,q)=E(|y(np).sup.r.Math.|y(nl)|.sup.ky*(nq).Math.y(nm))
(114) The following is obtained:
(115)
(116) By re-ordering/renaming indices:
(117)
(118) h can be computed using a matrix inversion (performed in CPU):
h=B.sup.1C
(119) h is used for the DPD coefficients.
(120) Estimation of mathematical expectations:
(121)
(122) Vector Convolution
(123)
(124) In the exemplary embodiment of
(125) The disclosed vector convolution function (vec_conv( )) accelerates the FIR filter within the vector convolution function 1800 where the coefficients are, e.g., binary values (such as 2 bit, 4 bit, etc.). Additionally, the operation can be further accelerated and performed in a single cycle using a sufficient number of bits for the coefficient, such as 18 bits. Generally, each time shifted operation comprises an FIR filtering of the shifted input value 1820 and the coefficient.
(126) For an exemplary convolution with 2 bit values, an FIR filter/convolution operation can be written as follows:
(127)
where:
(128)
where h(k) indicates the coefficients and x(nk) indicates the time shifted input values. In the case of a multi-phase filter, the coefficients h.sub.k can be changed for each phase of the filter.
(129) The convolution of an input signal x by a filter having an impulse response h can be written as follows:
(130)
(131) The correlation or cross-correlation of an input signal x with an input signal y can be written as follows (where signal x and/or signal y can be a known reference signal such as a pilot signal or a CDMA binary/bipodal code):
(132)
(133) For an exemplary convolution with a 12-bit representation of the coefficients, there are 6 iterations to compute the FIR filter output (6 times 2-bit values).
(134) For a more detailed discussion of a convolution instruction for a vector processor, see, for example, International Patent Application Serial No. PCT/US2012/062182, entitled Vector Processor Having Instruction Set With Vector Convolution Function for FIR Filtering, filed contemporaneously herewith and incorporated by reference herein.
(135) Equalization/IQ Imbalance Correction 140
(136) As indicated above, the equalization/IQ imbalance correction 140 performs IQ correction and employs RF channel equalization to mitigate channel impairments. As discussed hereinafter, RF channel equalization and/or I/Q imbalance correction can be implemented using vector multiplication, addition and reduction or convolution instruction. Likewise, can be implemented using vector multiplication/addition/reduction or correlation instruction. In an exemplary embodiment, RF channel equalization and I/Q imbalance correction are combined in the equalization/IQ imbalance correction 140.
(137)
(138)
(139) For example, each FIR filter 1900 can be implemented as an FIR filter having 32 taps at a sampling rate of 307.2 MSPS. The two parallel FIR filters 1900-1, 1900-2 can have complex inputs and complex coefficients. In the exemplary embodiment of
(140) Thus, frequency-dependent I/Q imbalance correction is performed using two FIR filters with input x and conjugate of x where x is the input to I/Q imbalance correction processing.
(141) The combined RF equalizer and IQ imbalance correction (IQIC) stage 1900 can be implemented in hardware or in software using the convolution instruction in a vector processor, as discussed further above in conjunction with
(142) Channel Filters/Channel Digital Down Conversion (DDC) Block 180
(143) The channel filters/channel digital down conversion (DDC) block 180 can be implemented in a similar manner as the channel filter and digital up conversion stage 110 of
(144) Incorporated Applications
(145) For a more detailed discussion of a number of the non-linear functions and other functions discussed herein, see, for example, U.S. patent application Ser. No. 12/324,926, filed Nov. 28, 2008, entitled Digital Signal Processor Having Instruction Set with One or More Non-Linear Complex Functions; U.S. patent application Ser. No. 12/324,927, filed Nov. 28, 2008, entitled Digital Signal Processor Having Instruction Set With One Or More Non-Linear Functions Using Reduced Look-Up Table; U.S. patent application Ser. No. 12/324,934, filed Jan. 8, 2008, entitled Digital Signal Processor With One Or More Non-Linear Functions Using Factorized Polynomial Interpolation; U.S. patent application Ser. No. 12/362,874, filed Jan. 30, 2009, entitled Digital Signal Processor Having Instruction Set With An Xk Function Using Reduced Look-Up Table; U.S. patent application Ser. No. 12/849142, filed Aug. 3, 2010, entitled System and Method for Providing Memory Bandwidth Efficient Correlation Acceleration; and/or Lei Ding et al., Compensation of Frequency-Dependent Gain/Phase Imbalance in Predistortion Linearization Systems, IEEE Transactions on Circuits and Systems, Vol. 55, No. 1, 390-97 (February 2008), each incorporated by reference herein.
(146) Conclusion
(147) While exemplary embodiments of the present invention have been described with respect to digital logic blocks and memory tables within a digital processor, as would be apparent to one skilled in the art, various functions may be implemented in the digital domain as processing steps in a software program, in hardware by circuit elements or state machines, or in combination of both software and hardware. Such software may be employed in, for example, a digital signal processor, application specific integrated circuit or micro-controller. Such hardware and software may he embodied within circuits implemented within an integrated circuit.
(148) Thus, the functions of the present invention can be embodied in the form of methods and apparatuses for practicing those methods. One or more aspects of the present invention can be embodied in the form of program code, for example, whether stored in a storage medium, loaded into and/or executed by a machine, wherein, when the program code is loaded into and executed by a machine, such as a processor, the machine becomes an apparatus for practicing the invention. When implemented on a general-purpose processor, the program code segments combine with the processor to provide a device that operates analogously to specific logic circuits. The invention can also be implemented in one or more of an integrated circuit, a digital processor, a microprocessor, and a micro-controller.
(149) It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.