CORDIC COMPUTATION OF SIN/COS USING COMBINED APPROACH IN ASSOCIATIVE MEMORY
20220413799 · 2022-12-29
Inventors
Cpc classification
International classification
Abstract
A method for an associative memory device includes the steps of providing a look up table (LUT) with all possible solutions for N first iterations of a CORDIC algorithm, receiving a plurality of input angles, concurrently computing a location index for each angle of the plurality of angles and concurrently storing each index in a column of the associative memory device, copying a solution from the LUT in the location index to a plurality of columns associated with the index and concurrently performing M additional iterations of the CORDIC algorithm on the columns to compute a value of a trigonometric function for each angle.
Claims
1. A method for an associative memory device, the method comprising: providing a look up table (LUT) with all possible solutions for N first iterations of a CORDIC algorithm; receiving a plurality of input angles; concurrently computing a location index for each input angle of said plurality of input angles and concurrently storing each index in a column of said associative memory device; copying a solution from said LUT in said location index to a plurality of columns associated with said index; and concurrently performing M additional iterations of the CORDIC algorithm on said plurality of columns to compute a value of a trigonometric function for each said input angle of said plurality of input angles.
2. The method of claim 1 wherein said concurrently computing a location index comprises: concurrently for each said input angle: initiating a temporary angle to the value of said input angle; determining a value of a bit and a sign based on a comparison between said temporary angle and zero; assigning said bit to said location index; computing a rotating angle based on said sign and a predetermined angle; adding said rotating angle to said temporary angle; repeating the steps of initiating, determining, assigning and computing N times thereby creating said location index with N bits.
3. The method of claim 1 wherein N=5 and M=5.
4. The method of claim 1 wherein said copying a solution comprises: sequentially going over all entries of said LUT and concurrently copying an X value and a Y value from a location index in said LUT to all columns having a same computed location index.
5. A sine and cosine estimator comprising: a lookup table (LUT) to store sine and cosine values computed in advance for N first iterations of a CORDIC algorithm; an associative memory array to store information related to a plurality of angles; a LUT index builder to concurrently build for each of said plurality of angles an index reflecting rotations of predefined rotating angles; a LUT value assigner to assign values from entries in said LUT to columns of said associative memory array sharing a same index; and a CORDIC computer to concurrently compute M additional iterations of said CORDIC algorithm on a plurality of columns of said associative memory array thereby providing a sine and cosine value after N+M iterations of said CORDIC algorithm to said plurality of angles.
6. The sine and cosine estimator of claim 5 wherein N=5 and said LUT comprises 2.sup.5 entries.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034] It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
DETAILED DESCRIPTION OF THE PRESENT INVENTION
[0035] In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.
[0036] Numerous applications, such as the Synthetic Aperture Radar (SAR) algorithm used for creating an image from radar pulses, need concurrent efficient computation of trigonometric functions of multiple angles. Applicant has realized that associative memory devices, such as the ones described in U.S. Pat. No. 9,558,812 (entitled “SRAM multi-cell operations”) and U.S. Pat. No. 10,832,746 (entitled “Non-volatile in-memory computing device”), commonly owned by Applicant and incorporated herein by reference, may concurrently and efficiently compute sine and cosine values for multiple angles.
[0037] Applicant has realized that implementing the CORDIC algorithm in such devices may provide concurrent trigonometrical function computation with constant complexity. The complexity may depend only on the total number of iterations (T) performed by the CORDIC algorithm and not on the number of input angles (which may be very large, e.g., 32K angles) for which the sine or cosine values are needed.
[0038] Applicant has also realized that a combined approach, where the result of the first N iterations of the CORDIC algorithm are stored in a LUT and then the computation defined by the CORDIC algorithm is performed for additional M iterations, may improve the performance of the total computation (of T=M+N iterations) compared to using a LUT for the total of T iterations or to performing T iterations of the CORDIC algorithm.
[0039] In one embodiment of the combined approach, the LUT may store in advance all possible values that may be obtained after 5 iterations of the CORDIC algorithm and the next 5 iterations of the CORDIC algorithm may be performed concurrently in an associative memory array. The accuracy of the results after 10 iterations of the CORDIC algorithm may be ±0.00097656203.
[0040]
[0041] Column 21 of LUT 20 may provide the index to the LUT. The index to the LUT may be built such that each bit i of the index indicates the direction of the rotation of angle γ.sub.i in iteration i. The value of LSB of the index indicates the direction of the first rotation, the next bit indicates the direction of the next rotation and so on until the MSB, which indicates the direction of the last rotation. A value 0 for a bit i in the index indicates a counterclockwise rotation by γ.sub.i and the value 1 for bit i in the index indicates a clockwise rotation by γ.sub.i.
[0042] Column 22 of LUT 20 may provide the value of X (cos(α)) after 5 iterations of the CORDIC algorithm for each possible index and column 23 may provide the value of Y (sin(α)) after 5 iterations of the CORDIC algorithm for each possible index.
[0043] Each row 24 in LUT 20 may provide the sine (Y) and cosine (X) values of an angle Σγ.sub.i derived by the consecutive rotations of the predefined angles γ.sub.i.
[0044]
[0045] In step 320, flow 300 may receive multiple angles α.sub.k for which the cosine and or sine values are needed.
[0046] In step 340, flow 300 may perform the first part of the combined approach using a LUT. In step 340, flow 300 may concurrently, for each angle α.sub.k, compute the index to the LUT and may read from the LUT the values X.sub.k and Y.sub.k for each angle α.sub.k computed in advance for the first N iterations of the CORDIC algorithm.
[0047] In step 360, flow 300 may perform the second part of the combined approach and may perform M additional iterations of the CORDIC computation. In step 360, flow 300 may concurrently, for each angle α.sub.k, compute the next M iterations of the CORDIC algorithm starting with the values X.sub.k and Y.sub.k provided by step 340 and generating the final values of X.sub.k and Y.sub.k after the total of T iterations.
[0048] In step 380, flow 300 may provide the final values X.sub.k and Y.sub.k that are the estimated value of cosine and sine trigonometric functions after T=N+M iterations.
[0049]
[0050] Associative memory array 410 may comprise a plurality of cells 411 arranged in a matrix having bit lines 413 (columns) and word lines 415 (rows). All cells 411 in the same column may be connected to the same bit line 413 and all cells 411 in the same row may be connected to the same word line 415. Associative memory array 410 is detailed in
[0051] LUT 420 may be an embodiment of LUT 20 (
[0052] LUT index builder 430, constructed and operative in accordance with an embodiment of the present invention, may concurrently, for each angle α.sub.k, create an index J.sub.k to be used as an index to LUT 420. LUT index builder 430 may write the created index J.sub.k to those bit lines 413 related to angle α.sub.k. of associative memory array 410. The flow for creating an index J.sub.k for each angle α.sub.k is described in
[0053] LUT value assigner 440, constructed and operative in accordance with an embodiment of the present invention, may go through all entries in LUT 420 and may concurrently write the values X.sub.k and Y.sub.k for each index J.sub.k to those columns of associative memory array 410 associated with index J.sub.k. The flow performed by LUT value assigner 440 is described in
[0054] CORDIC computer 450, constructed and operative in accordance with an embodiment of the present invention, may concurrently, for each angle α.sub.k, compute M iterations of the CORDIC algorithm, starting with values X.sub.k and Y.sub.k stored in columns of associative memory array 410. The flow performed by CORDIC computer 450 is described in
[0055]
[0056] In step 501, LUT index builder 430 may receive a plurality of angles α.sub.k for which the sine or cosine values are required.
[0057] In step 510, LUT index builder 430 may initialize each β.sub.k to the value of an input angle α.sub.k and the iterator i (used to iterate over bits of the index) to 0.
[0058] In step 520, LUT index builder 430 may start the first iteration concurrently on all angles β.sub.k. In steps 530, 543, 546, 550 and 560, LUT index builder 430 may perform the rotations and may determine the value of the bits in the plurality of indexes J.sub.k (the index related to each angle α.sub.k) to LUT 420. LUT index builder 430 may perform N rotations to compute N bits for each index J.sub.k. At each iteration i, bit i of index J.sub.k may be determined by comparing the value of a temporary angle β.sub.k to zero and assigning the relevant value (0 or 1) to bit i of J.sub.k.
[0059] In step 530, LUT index builder 430 may compare the value of each angle β.sub.k 0. If angle β.sub.k is smaller than 0, LUT index builder 430 may continue to step 546 where Sign.sub.k may be assigned the value (−1) and a value of a bit i of each index J.sub.k may be assigned the value 1. If angle β.sub.k is larger than 0, LUT index builder 430 may continue to step 543 where Sign.sub.k may be assigned the value 1 and a value of a bit i of each index J.sub.k may be assigned the value 0. In step 550, LUT index builder 430 may update the value of bit i of each index J.sub.k, compute the current rotating rotatingAngle.sub.k (by multiplying γ.sub.i by Sign.sub.k), that may be added to temporary angle β.sub.k. LUT index builder 430 may then increment iterator i to handle the next bit of each index J.sub.k.
[0060] In step 560, LUT index builder 430 may check if all bits of indexes J.sub.k have been computed. If the index is not ready, LUT index builder 430 may return to step 520, and if the index is completed, LUT index builder 430 may provide (step 570) as output an index J.sub.k for each input angle α.sub.k and write the computed indexes J.sub.k to columns of associative memory array 410.
[0061]
[0062] In step 601, LUT value assigner 440 may receive a plurality of angles α.sub.k with an index J.sub.k associated to each angle α.sub.k.
[0063] In step 610, LUT value assigner 440 may initialize iterator i (used to iterate over the entries of LUT 420, each entry identified by an index) to 0. In step 620, LUT value assigner 440 may start the first iteration concurrently on all columns of associative memory array 410. In step 630 the value of each index J.sub.k may be compared to i. If the value of J.sub.k equals i, LUT value assigner 440 may continue to step 640 and copy the value of X from entry i in LUT 420 to a column associated with angles α.sub.k and continue to step 650. If the value of J.sub.k is not equal to i LUT value assigner 440 may continue directly to step 650. In step 650 the iterator i may be incremented and in step 660 LUT value assigner 440 may check if flow 600 reached the last entry of LUT 420. If the entry i is not the last entry, LUT value assigner 440 may return to step 620 to handle the next entry of LUT 420. Otherwise, LUT value assigner 440 may finish (step 670) having the values of X.sub.k and Y.sub.k associated with each angle α.sub.k written to columns of memory array 410.
[0064]
[0065] The input to CORDIC computer 450, in step 701, may be all input angles α.sub.k, the values of X.sub.k and Y.sub.k and the value of temporary angle β.sub.k computed by LUT index builder 430 all stored in columns of memory array 410. The next M iterations of the CORDIC algorithm may also be concurrently executed on all columns of associative memory array 410 providing the final step of the computation of sine/cosine estimator 400 which is the sine and cosine values for all input angles α.sub.k.
[0066] In step 710, CORDIC computer 450 may create for each angle α.sub.k temporary parameters tempX.sub.k and tempY.sub.k and may initialize them to X.sub.k and Y.sub.k respectively. The temporary parameters may be used throughout the computation of the CORDIC algorithm as input for the next iteration.
[0067] In step 720, the value of each angle β.sub.k may be compared to 0. If angle β.sub.k is smaller than 0, flow 700 may continue to step 736 where the value of Sign.sub.k may be set to (−1). If angle β.sub.k is larger than 0, flow 700 may continue to step 733 where the value of Sign.sub.kα.sub.k may be set to (+1).
[0068] In step 740, CORDIC computer 450 may concurrently, for all angles α.sub.k, compute the next iteration of the CORDIC algorithm and may update the values of X.sub.k and Y.sub.k for all of the current iteration i using the temporary parameters as defined in equations 7 and 8:
X.sub.k=tempX.sub.k−Sign.sub.k*(tempY.sub.k*2.sup.i) Equation 7
Y.sub.k=Sign.sub.k*tempX.sub.k*2.sup.−i+tempY.sub.k Equation 8
[0069] The current rotating angle γ.sub.i may be added to or deleted from temporary angle β.sub.k according to the Sign.sub.k and iterator i (used to iterate over predefined angles γ.sub.i) may be incremented.
[0070] In step 750, CORDIC computer 450 may check if the final iteration has been completed. If the iteration is not the last (i.e., iteration i is smaller than T), CORDIC computer 450 may return to step 710, and if the last iteration has be performed, as checked in step 750, CORDIC computer 450 may continue to step 760 where the computed values of each α.sub.k—X.sub.k and Y.sub.k—may provide the values of cos(α.sub.k) and sin(α.sub.k), respectively.
[0071] It may be appreciated that the number of iterations performed by CORDIC computer 450 (e.g., M=5) may be optimized to current capabilities of the hardware of sine/cosine estimator 400 but is not limiting and may change to achieve the best performance according to hardware capabilities in the future.
[0072]
[0073] Memory array 810 may be any suitable memory array, volatile or non-volatile, destructive, or non-destructive and may comprise pure memory cells arranged in rows and columns. The cells in a column may be connected by a bit line processor capable of performing computation on the column. The cells in a row may be connected by a word line capable of activating cells in multiple columns. Data including input, intermediate results and output may be stored in columns of memory array 810.
[0074] Multiple row decoder 820 may be any suitable row decoder capable of concurrently activating a plurality of rows. Multiple row decoder 820 may activate two or more rows of memory array 810 at a time. When multiple rows are activated, all columns of memory array 810 may provide concurrent computation for the activated rows when a read operation is performed and may provide a concurrent write operation when a write operation is performed.
[0075] Multiple column decoder 830 may comprise any suitable column decoder capable of concurrently activating a plurality of columns and any suitable sensing circuitry that may be capable of sensing the value on any bit-line connecting cells of a column. Multiple column decoder 830 may provide the result of a Boolean function performed between multiple cells of each column, concurrently activated by multiple row decoder 820. Multiple column decoder 830 may select which sensed columns to write back to memory array 810 and may be capable of writing the value from a plurality of sensing circuitry components concurrently.
[0076] Controller 840 may control the activating of multiple row decoder 820 and multiple column decoder 830. Controller 840 may indicate to multiple row decoder 820 which rows to activate for the current operation, read or write, and may also indicate to multiple column decoder 830 from which columns to write the output back into memory array 810 and the rows to which the data may be written in a selective write operation.
[0077] Controller 840 may comprise various parts of sine/cosine estimator 400, such as LUT index builder 430, LUT value assigner 440 and CORDIC computer 450.
[0078] It may be appreciated that the computations of sine/cosine estimator 400 may occur within the associative memory array, as a result of multi read and multi write operations. Thus, sine/cosine estimator 400 may implement concurrently any Boolean operation, on all the columns of memory associative memory array 410, resulting in a massive, in place, parallel computation. (Each column may perform the needed computation for a single angle and activating multiple columns may result in concurrent computation of the trigonometric function for multiple angles).
[0079] It may be appreciated that the complexity of the computation of sine/cosine estimator 400 does not depend on the number of input angles α.sub.k.
[0080] Sine/cosine estimator 400 may receive multiple angles as input and may handle the computation of each angle α.sub.k in one or more dedicated columns of associative memory array 410. The complexity of computing sine and cosine of a single angle is the same as the complexity of computing sine and cosine of multiple angles.
[0081] The LUT index builder 430 may concurrently, for each angle α.sub.k, compute an index J.sub.k. The complexity of this operation may be O(N) where N is the number of bits of the LUT index.
[0082] For each entry in the LUT 420, LUT value assigner 440 may concurrently copy the values of X and Y from entry J.sub.k of LUT 420 to columns of associative memory array 410 sharing the same index J.sub.k. The complexity of this operation is O(2.sup.N) where N is the number of bits of the LUT index.
[0083] Concurrently, for each angle, CORDIC computer 450 may compute the value of X and Y starting from iteration N (the value for the first N iterations is taken from the LUT) for M additional iterations. The complexity of this operation is O(M) where M is the number of iterations of computing the CORDIC algorithm.
[0084] It may also be noted that sin/cosine estimator 400 may modify the representation of input angles α.sub.k to a normalized sign fixed point with 14 bits after the dot in radians. The normalization may comprise dividing each angle α.sub.k (which may be in the range [−π, π]) by π which may result in a new range [−1, 1].
[0085] The standard CORDIC operates with angles in the range of [−π/2, π/2] while sin/cosine estimator 400 may operate on the entire range [−π, π] of angles which may be normalized. Sin/cosine estimator 400 may transform the results to align them back to the original values before the normalization and may normalize them again (divide by 2).
[0086] It may be appreciated that, for 10 iterations, the smallest rotating angle is arctan ( 1/1024) radians which is approximately 0.000976562 radians; therefore, the maximum error of sin/cosine estimator 400 may be approximately 0.00097656203. (The maximum error is estimated by converting the max error in radians to a pure number (number without units). The smallest rotation angle when T=10 is 1/1024 therefore the maximum error for cosine is 4.768*10{circumflex over ( )}-7 and for sine is 0.00097656203.)
[0087] It may be appreciated that the computation time of each method separately in associative memory array 410 (LUT with 10-bit index or 10 iterations of the CORDIC algorithm) is higher than the computation time of the combined approach as described herein below.
[0088] For 10 iterations of the CORDIC algorithm: each iteration takes (68+2*i) when i is the iteration number with a total of 770 cycles.
[0089] For a LUT covering 10 iteration (size of the LUT 2.sup.10): each iteration for computing the indexes J.sub.k takes 24 cycles per iteration with a total of 240 cycles.
[0090] The complexity of the lookup for getting the values from LUT 420 to associative memory array 410 is 1024 iterations with 5 cycles per iterations, which sum up to 5 k cycles.
[0091] The complexity of 10 iterations of the combined approach, 5 iterations covered by each part, (LUT of size 2.sup.5 and 5 iterations of the CORDIC algorithm) includes step 340 (
[0092] Step 340 (LUT) includes 5 iterations that takes 24 cycles each to build index J.sub.k with a total of 5*24=120 cycles, and 32 lookups that takes 5 cycles each to assign the values from the LUT to each angle with a total of 5*32=180 which sum up to 120+160=280 for the entire LUT operation.
[0093] The CORDIC part includes 5 iterations that takes 68+2*i cycles each which sum up to less than 400 cycles.
[0094] The total number of cycles for the combined approach is therefore 280+400=680 cycles, which is less than using LUT for 10 iterations (3000) or performing 10 iterations of the CORDIC algorithm (770).
[0095] It may be appreciated that the steps shown for the flows herein above are not intended to be limiting and that each flow may be practiced with variations. These variations may include more steps, less steps, changing the sequence of steps, skipping steps, among other variations which may be evident to one skilled in the art.
[0096] While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.