METHOD OF REALIZING ACCELERATED PARALLEL JACOBI COMPUTING FOR FPGA

Abstract

The invention discloses a method of realizing accelerated parallel Jacobi computing for an FPGA. Data of a n×n-dimensional matrix are input to the FPGA, and a rotation transformation process is carried out by using parallel Jacobi computing. Processors are initialized. A diagonal processor computes a symbol set corresponding to a rotation angle and outputs the symbol set to a non-diagonal processor. Elements of the diagonal processor are updated. Elements of the non-diagonal processor are updated. Elements between the processors are exchanged. After the elements of the respective processors are updated, the updated elements between the processors are exchanged. The invention requires less FPGA resources while yields a higher internal computational processing performance of the FPGA. Accordingly, the invention is capable of facilitating the efficiency of realizing eigenvalue decomposition in the FPGA and is highly applicable in actual processing.

Claims

1. A method of realizing accelerated parallel Jacobi computing for an FPGA, comprising: step (1) initializing processors: inputting data of a n×n-dimensional matrix into the FPGA and carrying out a rotation transformation process using the parallel Jacobi computing, wherein a CORDIC algorithm is adopted in the parallel Jacobi computing to carry out a planar rotation, and a two-dimensional X-Y coordinate system is established in the planar rotation, a plurality of processors are provided in the FPGA, the processors are arranged in an array, each of the processors is connected with an adjacent processor via a data interface to exchange data and elements, and each element in the n×n-dimensional matrix for carrying out the parallel Jacobi computing is assigned to a processor if of the processors according to a formula as follows: $P_{ij} = (\begin{matrix} a_{2 i - 1, 2 j - 1} & a_{2 i - 1, 2 j} \\ a_{2 i, 2 j - 1} & a_{2 i, 2 j} \end{matrix}), i \leq j, j = 1, 2, .Math., \frac{n}{2},$ wherein P.sub.ij represents the processor in an i.sup.th row and a j.sup.th column, a.sub.2i,2j represents an element in a 2i.sup.th row and a 2j.sup.th column in the n×n-dimensional matrix, and n represents dimensionality of the n×n-dimensional matrix, and the processor P.sub.ij whose subscripted symbol satisfies i=j is a diagonal processor and the processor P.sub.ij whose subscripted symbol does not satisfy i=j is a non-diagonal processor, and in the processor P.sub.ij an element whose subscripted symbol satisfies 2i=2j and 2i−1=2j−1 is a diagonal element, and an element whose subscripted symbol does not satisfy 2i=2j and 2i−1=2j−1 is a non-diagonal element; step (2) computing a symbol set corresponding to a rotation angle 2θ by the diagonal processor and outputting the symbol set to the non-diagonal processor: obtaining a symbol set {d.sub.2θ,k}, k=1, 2, . . . , N, which corresponds to a rotation angle 2θ of the CORDIC algorithm, through iterations by using a formula as follows, wherein a total number of the iterations is the same as a total number of iterations of the CORDIC algorithm: $\tan (θ_{k}) = \frac{α_{k}}{β_{k}} = \tan (θ_{k - 1} - d_{2 θ, k} ϕ_{k - 1}) = \frac{{tanθ}_{k - 1} - d_{θ, k} {tanϕ}_{k - 1}}{1 + d_{θ, k} {tanθ}_{k - 1} {tanϕ}_{k - 1}} = \frac{α_{k - 1} 2^{k - 1} - d_{θ, k} β_{k - 1}}{β_{k - 1} 2^{k - 1} + d_{θ, k} α_{k - 1}}, d_{2 θ, k} = {\begin{matrix} - 1, & \frac{α_{k - 1}}{β_{k - 1}} < 0 \\ 1, & \frac{α_{k - 1}}{β_{k - 1}} \geq 0 \end{matrix}, θ_{0} = 2 θ, \tan (ϕ_{k - 1}) = 2^{- (k - 1)}, k = 1, 2, .Math., N,$ wherein k represents an ordinal number of an iteration, N represents the total number of the iterations and is set as a data bit number adopted by the FPGA, α.sub.k represents a first symbol parameter of a k.sup.th iteration, β.sub.k represents a second symbol parameter of the k.sup.th iteration, θ.sub.0 represents a rotation angle initial value, that is, 2θ, θ.sub.k represents a residual rotation angle through k times of iterations, ϕ.sub.k−1 represents an angle parameter of a (k−1).sup.th iteration, and d.sub.2θ,k represents a symbol corresponding to the rotation angle 2θ at the k.sup.th iteration, and the diagonal processor outputs the rotation angle 2θ obtained through computing carried out by itself and the corresponding symbol set {d.sub.2θ,k} to the non-diagonal processor on the same row and the non-diagonal processor on the same column; step (3) updating elements of the diagonal processor: carrying out the CORDIC algorithm on first to-be-rotated coordinates (2a.sub.pq,a.sub.pp−a.sub.qq) by using d.sub.2θ,k obtained in each of the iterations in the step (2) as a rotation symbol of the k.sup.th iteration in the CORDIC algorithm, so as to carry out a planar rotation by using the rotation angle 2θ; after all the iterations in the step (2) are completed, multiplying a final planar rotation result by a first compensation factor to obtain rotated Y coordinates, that is, y.sub.1=2a.sub.pq sin 2θ+(a.sub.pp−a.sub.qq) cos 2θ, wherein the first compensation factor is obtained according to a formula as follows: $C_{1} = {.Math.}_{k = 1}^{N} \cos (ϕ_{k - 1}),$ wherein C.sub.1 represents the first compensation factor; updating diagonal elements in the diagonal processor by using a formula as follows, and setting non-diagonal elements to 0: ${a^{'}}_{pp} = \frac{a_{qq} + a_{pp} + y_{1}}{2}, {a^{'}}_{qq} = \frac{a_{qq} + a_{pp} - y_{1}}{2},$ wherein a′.sub.pp, a′.sub.qq represent two updated diagonal elements in the diagonal processor, y.sub.1 represents a rotated Y-axis coordinate of the first to-be-rotated coordinates; step (4) updating elements of the non-diagonal processor; step (5) exchanging the elements between the processors; step (6) updating the non-diagonal elements in all the diagonal processors in the n×n-dimensional matrix by the parallel Jacobi computing after the exchanging, returning to the step (2) for another round of processing and updating, repeating the updating until the non-diagonal elements in the n×n-dimensional matrix gradually converge to 0, finishing the updating when a predetermined convergence accuracy is met, and ending the parallel Jacobi computing.

2. The method of realizing the accelerated parallel Jacobi computing for the FPGA as claimed in claim 1, wherein in the step (2), an initial rotation angle corresponding to the non-diagonal elements in the diagonal processor when iterative computing starts is θ, and a computation is as follows: $\tan (2 θ) = \frac{α_{0}}{β_{0}}, α_{0} = 2 a_{pq}, β_{0} = a_{pp} - a_{qq},$ wherein a.sub.pq, a.sub.qp respectively represent two non-diagonal elements initially included in the diagonal processor, a.sub.qp=a.sub.pq, a.sub.pp and a.sub.qq respectively represent diagonal elements initially included in the diagonal processing unit, α.sub.0 represents an initial first symbol parameter, and β.sub.0 represents an initial second symbol parameter.

3. The method of realizing the accelerated parallel Jacobi computing for the FPGA as claimed in claim 1, wherein the n×n-dimensional matrix is a covariance matrix of data collected by an antenna array or data before image dimensionality reduction, and is a real symmetric matrix.

4. The method of realizing the accelerated parallel Jacobi computing for the FPGA as claimed in claim 1, wherein in the step (1), if n in the n×n-dimensional matrix is an odd number, the n×n-dimensional matrix is expanded into a matrix with even-numbered dimensionality by adding a n+1.sup.th column and a n+1.sup.th row, and element values of the added n+1.sup.th column and n+1.sup.th row n+1.sup.th are all set to 0.

5. The method of realizing the accelerated parallel Jacobi computing for the FPGA as claimed in claim 1, wherein the step (4) comprises: step (4.1) receiving, by the non-diagonal processor P.sub.ij, symbol sets output from two diagonal processors P.sub.ii, P.sub.jj and represented as {d.sub.2θ.sub.i.sub.,k}, {d.sub.2θ.sub.j.sub.,k}, wherein d.sub.2θ.sub.i.sub.,k and d.sub.2θ.sub.j.sub.,k respectively represent symbols corresponding to a rotation angle 2θ.sub.i and a rotation angle 2θ.sub.j at the k.sup.th iteration, and two symbols d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k and d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k are respectively computed by using formulae as follows to obtain two symbol sets {d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k} and {d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k}:
d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k=½(d.sub.2θ.sub.i.sub.,k+d.sub.2θ.sub.j.sub.,k), k=1,2, . . . ,N,
d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k=½(d.sub.2θ.sub.i.sub.,k−d.sub.2θ.sub.j.sub.,k), k=1,2, . . . ,N, wherein d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k and d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k respectively represent symbols corresponding to a rotation angle θ.sub.i+θ.sub.j and a rotation angle θ.sub.i−θ.sub.j, and 2θ.sub.i and 2θ.sub.j respectively represent double angles of rotation angles corresponding to the non-diagonal elements of the two diagonal processors P.sub.ii and P.sub.jj; step (4.2) computing values of a second compensation factor and a third compensation factor corresponding to all possible symbol combinations formed by first $.Math. \frac{N}{2} .Math.$ symbols in the two symbol sets {d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k} and {d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k} by using formulae as follows, one symbol combination being formed by $.Math. \frac{N}{2} .Math.$ symbols, so as to establish lookup table data by using the values of the second compensation factor and the third compensation factor corresponding to the respective, different symbol combinations, an absolute value of each symbol in the first $.Math. \frac{N}{2} .Math.$ symbols serves as a lookup address, and a lookup table is generated by using a block memory (block random access memory), an address bit number of the lookup table is set as $.Math. \frac{N}{2} .Math.,$ and a data depth is $2^{.Math. \frac{N}{2} .Math.} :$ $C_{2} = {.Math.}_{k = 1}^{.Math. \frac{N}{2} .Math.} \cos (d_{θ_{i} - θ_{j}, k} ϕ_{k - 1}), d_{θ_{i} - θ_{j}, k} \in {- 1, 0, 1}, C_{3} = {.Math.}_{k = 1}^{.Math. \frac{N}{2} .Math.} \cos (d_{θ_{i} + θ_{j}, k} ϕ_{k - 1}), d_{θ_{i} + θ_{j}, k} \in {- 1, 0, 1},$ wherein C.sub.2 represents the second compensation factor, and C.sub.3 represents the third compensation factor; step (4.3) for the non-diagonal processor, representing four elements included in the non-diagonal processor as $(\begin{matrix} a_{p_{1} q_{1}} & a_{p_{1} q_{2}} \\ a_{p_{2} q_{1}} & q_{p_{2} q_{2}} \end{matrix}),$ using the obtained d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k as the rotation symbol of the k.sup.th iteration in the CORDIC algorithm, carrying out the CORDIC algorithm on second to-be-rotated coordinates (a.sub.p.sub.1.sub.q.sub.1+a.sub.p.sub.2.sub.q.sub.2, a.sub.p.sub.1.sub.q.sub.2−a.sub.p.sub.2.sub.q.sub.1) to carry out the planar rotation by using the rotation angle θ.sub.i−θ.sub.j and multiplying a planar rotation result by the second compensation factor whose value is obtained by accessing the lookup table of the step (4.2) to obtain rotated coordinates represented as: ${\begin{matrix} x_{2} = (a_{p_{1} q_{1}} + a_{p_{2} q_{2}}) \cos (θ_{i} - θ_{j}) - (a_{p_{1} q_{2}} - a_{p_{2} q_{1}}) \sin (θ_{i} - θ_{j}) \\ y_{2} = (a_{p_{1} q_{2}} - a_{p_{2} q_{1}}) \cos (θ_{i} - θ_{j}) + (a_{p_{1} q_{1}} + a_{p_{2} q_{2}}) \sin (θ_{i} - θ_{j}) \end{matrix},$ wherein x.sub.2 and y.sub.2 respectively represent rotated coordinates of the second to-be-rotated coordinates; using the obtained d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k as the rotation symbol of the k.sup.th iteration in the CORDIC algorithm, carrying out the CORDIC algorithm on third to-be-rotated coordinates (a.sub.p.sub.1.sub.q.sub.2+a.sub.p.sub.2.sub.q.sub.1,a.sub.p.sub.1.sub.q.sub.1−a.sub.p.sub.2.sub.q.sub.2) to carry out a planar rotation by using the rotation angle θ.sub.i+θ.sub.j, and multiplying a planar rotation result by the third compensation factor whose value is obtained by accessing the lookup table of the step (4.2) to obtain rotated coordinates represented as: ${\begin{matrix} x_{3} = (a_{p_{1} q_{2}} + a_{p_{2} q_{1}}) \cos (θ_{i} + θ_{j}) - (a_{p_{1} q_{1}} - a_{p_{2} q_{2}}) \sin (θ_{i} + θ_{j}) \\ y_{3} = (a_{p_{1} q_{1}} - a_{p_{2} q_{2}}) \cos (θ_{i} + θ_{j}) + (a_{p_{1} q_{2}} + a_{p_{2} q_{1}}) \sin (θ_{i} + θ_{j}) \end{matrix},$ wherein x.sub.3 and y.sub.3 respectively represent rotated coordinates of the third to-be-rotated coordinates; and step (4.4) adopting formulae as follows to update the elements in the non-diagonal processor:
a′.sub.p.sub.1.sub.q.sub.1=½(x.sub.2+y.sub.3),
a′.sub.p.sub.1.sub.q.sub.2=½(x.sub.3+y.sub.2),
a′.sub.p.sub.2.sub.q.sub.1=½(x.sub.3−y.sub.2),
a′.sub.p.sub.2.sub.q.sub.2=½(x.sub.2−y.sub.3), wherein a′.sub.p.sub.1.sub.q.sub.1, a′.sub.p.sub.1.sub.q.sub.2, a′.sub.p.sub.2.sub.q.sub.1, and a′.sub.p.sub.2.sub.q.sub.2 respectively represent four elements included in the non-diagonal processor.

6. The method of realizing the accelerated parallel Jacobi computing for the FPGA as claimed in claim 1, wherein in the step (5), exchanging updated elements between the processors after the elements of each processor are updated, and the step (5) further comprises: step (5.A) exchanging the diagonal elements in the diagonal processor, wherein it is assumed that a current diagonal processor P.sub.ii comprises a diagonal element a.sub.p.sub.i.sub.p.sub.i and a diagonal element a.sub.q.sub.i.sub.q.sub.i, and, for the diagonal element a.sub.p.sub.i.sub.p.sub.i, i represents a diagonal processor row/column ordinal number, if i=1, the diagonal element a.sub.p.sub.i.sub.p.sub.i is not changed, if i=2, a value of the diagonal element a.sub.p.sub.i.sub.p.sub.i is changed to a value of a diagonal element a.sub.q.sub.i−1.sub.q.sub.i−1, and if i>2, the value of the diagonal element a.sub.p.sub.i.sub.p.sub.i is changed to a value a.sub.p.sub.i−1.sub.p.sub.i−1 of a diagonal element, and for the diagonal element a.sub.q.sub.i.sub.q.sub.i, if $i < \frac{n}{2},$ a value of the diagonal element a.sub.q.sub.i.sub.q.sub.i is changed to a value of a.sub.q.sub.i+1.sub.q.sub.i+1, and if $i < \frac{n}{2},$ the value of the diagonal element a.sub.q.sub.i.sub.q.sub.i is changed to the value of the diagonal element a.sub.p.sub.i.sub.p.sub.i; and step (5.B) exchanging the non-diagonal elements in the diagonal processor and elements in the non-diagonal processor by changing positions according to the following: positions of the non-diagonal elements in the diagonal processor and the elements in the non-diagonal processor are shifted, so that a subscripted row symbol of an element is the same as a row number of a diagonal element shifted to the same row after the exchanging of the step (5.A), and a subscripted column symbol of the element is the same as a column number of a diagonal element shifted to the same column after the exchanging of the step (5.A).

7. The method of realizing the accelerated parallel Jacobi computing for the FPGA as claimed in claim 1, wherein the steps (2), (3), and (4) are carried out simultaneously.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0058] FIG. 1 is a schematic diagram illustrating a framework of a diagonal processor according to an embodiment of the invention.

[0059] FIG. 2 is a schematic diagram illustrating a framework of a non-diagonal processor according to an embodiment of the invention.

[0060] FIG. 3 is a schematic diagram illustrating a framework of a processor array according to an embodiment of the invention.

[0061] FIG. 4 is a flowchart illustrating a computing method according to an embodiment of the invention.

DESCRIPTION OF THE EMBODIMENTS

[0062] In the following, details of the invention will be described with reference to the accompanying drawings in combination with exemplary embodiments of the invention.

[0063] The framework realized in an FPGA of the invention mainly includes a diagonal processor and a non-diagonal processor. The framework of the diagonal processor is as shown in FIG. 1, and the framework of the non-diagonal processor is as shown in FIG. 2. A framework of a processor array is as shown in FIG. 3. A flowchart for executing a computing method is as shown in FIG. 4.

[0064] The embodiment of the invention and the implementation process thereof are described in the following.

[0065] The specific implementation processes of the embodiment are realized in a Xilinx Virtex-7 XC7VX690T FPGA chip. Specifically, wireless signals emitted by a collector drone with a four-element antenna array is adopted in the implementation, and the signal incident direction is 0 degrees. A 4×4 real symmetric covariance matrix obtained through the computation carried out according to four data sets received by the four-element antenna is represented as A.

[0066] 16-bit fixed-point number is adopted to obtain the eigenvalues under a condition that

[00019] $A = (\begin{matrix} 237.4904 & 231.6229 & - 104.5409 & 24.3696 \\ 231.6229 & 541.7360 & - 78.0729 & - 10.1869 \\ - 104.5409 & - 78.0729 & 273.6585 & - 34.0290 \\ 24.3696 & - 10.1869 & - 34.0290 & 170.5949 \end{matrix}) .$

Specifically, the following steps are included.

[0067] (1) Initializing the processors: The respective elements in R.sub.r are assigned to the processors P.sub.ij. Each processor is connected with the adjacent processor via a data interface. A processor whose subscripted symbol satisfies i=j is defined as a diagonal processor, and a processor whose subscripted symbol does not satisfy such condition is defined as a non-diagonal processor. A matrix element whose subscripted symbol satisfies 2i=2j and 2i−1=2j−1 is defined as a diagonal element, and a matrix element whose subscripted symbol does not satisfy such condition is defined as a non-diagonal element.

[0068] (2) Computing a symbol set corresponding a rotation angle by the diagonal processor and outputting the symbol set to the non-diagonal processor: It is assumed that the non-diagonal elements included in the diagonal processor are a.sub.pq, a.sub.qp, and a.sub.qp=a.sub.pq. It is assumed that the diagonal elements included in the diagonal processor are a.sub.pp and a.sub.pp. It is also assumed that α.sub.0=2a.sub.pq, β.sub.0=a.sub.pp−a.sub.qq. It is assumed that the rotation angle corresponding to the non-diagonal element in the current diagonal processor is θ. A symbol set d.sub.2θ,k, k=1, 2 . . . , 16, which corresponds to a rotation angle 2θ of the CORDIC algorithm is obtained through iterations. The number of iterations is the same as the number of iterations in the CORDIC algorithm, and the data bit number, i.e., 16, adopted in the current system is used.

[0069] (3) Updating the elements of the diagonal processor: A compensation factor is obtained. d.sub.2θ,k obtained in Step (2) is used as the rotation symbol of the k.sup.th iteration in the CORDIC algorithm. This process replaces the step of computing the rotation symbol after each iteration in the conventional CORDIC algorithm. The CORDIC algorithm is executed to rotate (2a.sub.pq,a.sub.pp−a.sub.qq) by 2θ, and the result is multiplied by the compensation factor to obtain rotated y coordinates, i.e., y.sub.1=2a.sub.pq sin 2θ+(a.sub.pp−a.sub.qq) cos 2θ, and the diagonal elements in the diagonal processor are updated. In addition, the non-diagonal elements are set to 0.

[0070] (4) Updating the elements of the non-diagonal processor: The non-diagonal processor P.sub.ij receives the symbol sets output from the two diagonal processors P.sub.ii, P.sub.jj, and the symbol sets are represented as d.sub.2θ.sub.i.sub.,k, d.sub.2θ.sub.j.sub.,k k=1, 2, . . . , 32. d.sub.θ.sub.i.sub.+θ.sub.j.sub.,k and d.sub.θ.sub.i.sub.−θ.sub.j.sub.,k are respectively computed.

[0071] d.sub.θ.sub.i.sub.±θ.sub.j.sub.,k has three values, i.e., {−1,0,1}. The values of the compensation factors corresponding to all the possible value combinations for the first 16 symbols of the symbol set d.sub.θ.sub.i.sub.±θ.sub.j.sub.,k are computed. The values of the compensation factors are used as lookup table data. The absolute values of the respective symbols of the first 16 symbols in the symbol set are used as lookup addresses, and a lookup table is generated by using a block memory. When the number of iterations of the CRODIC algorithm exceeds 8, the difference between the compensation factor and 1 is less than 2.sup.−7, and the accuracy of 8-bit data is at most 2.sup.−7. Therefore, the remaining compensation factors may be directly considered as 1, i.e., no compensation is required. 8 is set as the address bit number of the lookup table, and the data depth is 2.sup.8. The lookup table of the example is as shown in Table 1.

TABLE-US-00001 TABLE 1 Lookup Table for Compensation Value Address Compensation value 0000 0000 1 0000 0001 0.99996948 . . . . . . 1111 1111 0.60726543

[0072] It is assumed that the matrix elements included in the current non-diagonal processor are

[00020] $(\begin{matrix} a_{p_{1} q_{1}} & a_{p_{1} q_{2}} \\ a_{p_{2} q_{1}} & a_{p_{2} q_{2}} \end{matrix}),$

and d.sub.θ.sub.l.sub.−θ.sub.m.sub.,k is used as the rotation symbol of the k.sup.th iteration in the CORDIC algorithm. A CORDIC algorithm rotation θ.sub.l-θ.sub.m is carried out on (a.sub.p.sub.1.sub.q.sub.1+a.sub.p.sub.2.sub.q.sub.2,a.sub.p.sub.1.sub.q.sub.2−a.sub.p.sub.2.sub.q.sub.1). A compensation factor is obtained from the lookup table. The result is multiplied by the compensation factor, thereby deriving the rotated coordinates.

[0073] d.sub.θ.sub.l.sub.−θ.sub.m.sub.,k is used as the rotation symbol of the k.sup.th iteration in the CORDIC algorithm. A CORDIC algorithm rotation θ.sub.l+θ.sub.m is carried out on (a.sub.p.sub.1.sub.q.sub.2+a.sub.p.sub.2.sub.q.sub.1, a.sub.p.sub.1.sub.q.sub.1−a.sub.p.sub.2.sub.q.sub.2). A compensation factor is obtained from the lookup table. The result is multiplied by the compensation factor, thereby deriving the rotated coordinates.

[0074] The elements of the non-diagonal processor are updated.

[0075] (5) Exchanging the elements between the processors: After the elements of the respective processors are updated, the matrix elements symmetric thereto are also updated to the same values. The updated elements are exchanged with the elements of other processors.

[0076] Then, the flow returns to Steps 2, 3, and 4 again for another round of computation and update. After three times of exchange, all the non-diagonal elements in the matrix have been updated once by the diagonal processors through Jacobi computing. Through multiple times of update, the non-diagonal elements in the matrix gradually converge to 0. The update ends after a predetermined convergence accuracy set by the user is met, and the parallel Jacobi computing ends.

[0077] The specific results are as follows:

[0078] First Round:

[00021] $P_{11} = (\begin{matrix} 237.4904 & 231.6229 \\ 231.6229 & 541.7360 \end{matrix}), P_{22} = (\begin{matrix} 273.6585 & - 34.0290 \\ - 34.0290 & 170.5949 \end{matrix}), P_{12} = (\begin{matrix} - 104.5409 & 16.3696 \\ - 78.0729 & - 10.1869 \end{matrix}),$

and the following is rendered after the update:

[00022] ${\overline{A}}_{1} = (\begin{matrix} 112.5 & 0 & - 58.0625 & 2.5625 \\ 0 & 666.75 & - 112.9375 & - 35.1875 \\ - 58.0625 & - 112.9375 & 283.875 & 0 \\ 2.5625 & - 35.1875 & 0 & 160.375 \end{matrix}),$

and the following is rendered after the element exchange:

[00023] $A_{1} = (\begin{matrix} 112.5 & 2.5625 & 0 & - 58.0625 \\ 2.5625 & 160.375 & - 35.1875 & 0 \\ 0 & - 35.1875 & 666.75 & - 112.9375 \\ - 58.0625 & 0 & - 112.9375 & 283.875 \end{matrix}) .$

[0079] Second Round:

[00024] $P_{11} = (\begin{matrix} 112.5 & 2.5625 \\ 2.5625 & 160.375 \end{matrix}), P_{22} = (\begin{matrix} 666.75 & - 112.9375 \\ - 112.9375 & 283.875 \end{matrix}), P_{12} = (\begin{matrix} 0 & - 58.0625 \\ - 35.1875 & 0 \end{matrix}),$

and the following is rendered after the update:

[00025] ${\overline{A}}_{2} = (\begin{matrix} 112.375 & 0 & 17.125 & - 55.4375 \\ 0 & 160.5 & - 33.0625 & - 12.25 \\ 17.125 & - 33.0625 & 697.625 & 0 \\ - 55.4375 & - 12.25 & 0 & 253 \end{matrix}),$

and the following is rendered after the element exchange:

[00026] $A_{2} = (\begin{matrix} - 112.375 & - 55.4375 & 0 & 17.125 \\ - 55.4375 & 253 & - 12.25 & 0 \\ 0 & - 12.25 & 697.625 & - 33.0625 \\ 17.125 & 0 & - 33.0625 & 160.5 \end{matrix}) .$

[0080] Eighth Round:

[00027] $P_{11} = (\begin{matrix} 92.5 & - 0.0625 \\ - 0.0625 & 700.1875 \end{matrix}), P_{22} = (\begin{matrix} 273.5 & - 0.0625 \\ - 0.0625 & 157.3125 \end{matrix}), P_{12} = (\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}),$

and the following is rendered after the update:

[00028] ${\overline{A}}_{8} = (\begin{matrix} 92.5 & 0 & 0 & 0 \\ 0 & 700.25 & 0 & 0 \\ 0 & 0 & 273.4375 & 0 \\ 0 & 0 & 0 & 157.3125 \end{matrix})$

[0081] As shown above, the non-diagonal elements of the matrix has met the convergence condition (while parallel Jacobi computing is an algorithm that approximates the diagonal elements to 0, a fixed-point number with a limited number of bits is used in actual implementation to represent decimals, so even though the values of the non-diagonal elements may reach 0, errors are also introduced). At this time, the elements on the diagonal of Ā.sub.8 are the eigenvalues that are obtained. By applying the obtained eigenvalues to a signal direction-of-arrival (DOA) estimation algorithm, as shown in the following diagram, the power spectrum function of a multiple signal classification (MUSIC) algorithm reaches the peak value at 0 degrees, which shows that the invention realizes a proper function.

[0082] In the embodiment, the performance of the invention in the actual application is demonstrated in two aspects, i.e., operation time, FPGA resource consumption.

[0083] Operation time: Since 16-bit fixed-point number is set for the data, the number of times of internal iterations is 16 in the CORDIC algorithm. Considering the result compensation, there are 17 FPGA clock cycles for the CORDIC algorithm cycle. Considering also that the elements need to be exchanged between steps of the parallel Jacobi, which takes one clock cycle, it takes a total of 18 clock cycles for the method of realizing the accelerated parallel Jacobi computing of the invention to realize one step of the parallel Jacobi. In the embodiment, the convergence condition is set as the absolute value of the maximum value of the non-diagonal elements in the covariance matrix being less than 0.001. The convergence condition is met after 8 iterations, which takes 144 clock cycles. The clock frequency used in the example is set as 250M, which takes 0.576 milliseconds.

[0084] Resource consumption: A Verilog program realizing the example is integrated on the Vivado 2017.1 software platform. The results shows that the example consumes 2360 lookup tables (LUTs) and 688 registers (REGs), which respectively takes up 0.54% and 0.79% of the total resources. According to the above, the design consumes only a limited amount of FPGA resources.

[0085] According to the conventional method for realizing parallel Jacobi, the diagonal processors require a CORDIC algorithm cycle to obtain rotation angles, and then require two successive CORDIC algorithms using the rotation angles obtained by the diagonal processors to update the elements of the diagonal processors. In other words, the conventional method requires a total of three CORDIC algorithm cycles. The non-diagonal processor needs to wait for the diagonal processor to obtain the rotation angle. Then, the non-diagonal processor also requires two consecutive CORDIC algorithms to update the elements of the non-diagonal processor. The angles of two rotations are respectively the rotation angle transmitted from the diagonal processor on the same row and the rotation angle transmitted from the diagonal processor on the same column. The respective processors operate in parallel, and it requires at least three CORDIC algorithm cycles to realize one step of parallel Jacobi. According to the CORDIC algorithm used in the invention, the operations are carried out in parallel and only require only one CORDIC algorithm cycle. The comparison between the invention and the conventional method in terms of the processes of the processors is as shown in Table 2.

TABLE-US-00002 TABLE 2 Comparison on Processes of Processors between the Invention and the Conventional Parallel Jacobi Method Conventional method Present invention Diagonal Non-diagonal Diagonal Non-diagonal Time processor processor processor processor First Compute a Wait Compute a Compute a symbol CORDIC rotation symbol set set corresponding algorithm angle corresponding to an angle sum cycle to a double and an angle angle of the difference of two rotation angle rotation angles and and carry out carry out a rotation a rotation Second First First rotation CORDIC rotation algorithm cycle Third Second Second CORDIC rotation rotation algorithm cycle

[0086] Based on the above, the invention significantly facilitates the speed of obtaining eigenvalues over the conventional method and is applicable when eigenvalue decomposition needs to be carried out quickly in actual processing.

[0087] The equivalent structural changes made by those skilled in the art based on the contents of the description and drawings of the invention shall be comprehensively covered in the scope of the invention.

METHOD OF REALIZING ACCELERATED PARALLEL JACOBI COMPUTING FOR FPGA

Assignee

Inventors

Cpc classification

Classification Explorer

G06F9/3885

PHYSICS

Classification Explorer

G06F7/4818

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06F9/30014

PHYSICS

International classification

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06F7/48

PHYSICS

Classification Explorer

G06F9/30

PHYSICS

Classification Explorer

G06F9/38

PHYSICS

Abstract

Claims

Description