Digital approximate multipliers for machine learning and artificial intelligence applications
11467805 · 2022-10-11
Inventors
Cpc classification
G06F7/5235
PHYSICS
G06F2207/5523
PHYSICS
International classification
Abstract
Digital approximate multipliers (aMULT) utilizing interpolative apparatuses, circuits, and methods are described in this disclosure. The disclosed aMULT interpolative methods can be arranged or programmed to operate asynchronously and or synchronously. For applications where less precision is acceptable, fewer interpolations can yield less precise multiplication results, while such approximate multiplication can be computed faster and at lower power consumption. Conversely, for applications where higher precision is required, more interpolations can generate more precise multiplication results. As such, by utilizing the disclosed aMULT method, the resolution and precision objectives of an approximate multiplication function can be pre-programmed or adjusted real-time and or on the fly, which enables optimizing for different and flexible power consumption and speed of multiplication, in addition to enabling the optimization of an approximate multiplier's die size and cost in accordance with cost-performance objectives.
Claims
1. An approximate digital multiplication method in a digital state machine in an integrated circuit, the method comprising: operating a digital cell (Z), the digital cell (Z) comprising: a pair of digital input ports (Px,Py) for receiving a pair of digital input words (x,y), a pair of digital output ports (PX.sub.o,PY.sub.o) for outputting a pair of digital output words (X.sub.o,Y.sub.o), wherein the pair of digital input words (x,y) span between a negative full-scale (−FS) and a positive full-scale (+FS); computing each digital output word X.sub.o of the digital cell (Z) as a function of an f-scaled absolute value of a sum of the digital input word x plus the digital input wordy minus the f-scaled absolute value of the digital input word x minus the digital input word y; computing each digital output word Y.sub.o of the digital cell (Z) as a function of the f-scaled absolute value of a sum of the digital input word x plus the digital input word y plus the f-scaled absolute value of the digital input word x minus the digital input word y minus the positive full-scale (+FS); wherein
2. The approximate digital multiplication method in a digital state machine in an integrated circuit of claim 1, the method further comprising: receiving the pair of digital input words (x,y) at the digital input ports (Px,Py) of the first digital cell (Z) corresponding to i=0; generating a pair of digital output words X.sub.Oi+1 and Y.sub.Oi+1 at the digital output ports (PX.sub.o,PY.sub.o) of a respective i.sup.th digital cell (Z) in the cascaded sequence of the plurality (n) of the digital cell (Z)s; generating an interpolated approximate multiplication digital output word xy.sub.n in each of the respective i.sup.th digital cell (Z) in the cascaded sequence of the plurality (n) of the digital cell (Z)s, wherein xy.sub.i+1 is substantially equal to a SUM from i=0 to i=n−1 of a product of X.sub.Oi+1 and −½.sup.(2i) wherein
3. The approximate digital multiplication method in a digital state machine in an integrated circuit of claim 2, the method further comprising: generating an at least another interpolated approximate multiplication digital output word (xy.sub.nA) in at least another digital cell (Z) wherein xy.sub.nA=xy.sub.(i+1)A=xy.sub.i+1+(Xo.sub.i+1×yo.sub.i+1)/2.sup.2i, wherein
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The subject matter presented herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and illustrations, and in which like reference numerals refer to similar elements, and in which:
(2)
(3)
(4) FIG. 1AAA is a circuit simulation showing the relation between the number of interpolation and the error (i.e., deviation from an ideal square) attributed to the aSQR method (illustrated in
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
DETAILED DESCRIPTION
(23) Numerous embodiments are described in the present application and are presented for illustrative purposes only and is not intended to be exhaustive. The embodiments were chosen and described to explain principles of operation and their practical applications. The present disclosure is not a literal description of all embodiments of the disclosure(s). The described embodiments also are not, and are not intended to be, limiting in any sense. One of ordinary skill in the art will recognize that the disclosed embodiment(s) may be practiced with various modifications and alterations, such as structural, logical, and electrical modifications. For example, the present disclosure is not a listing of features which must necessarily be present in all embodiments. On the contrary, a variety of components are described to illustrate the wide variety of possible embodiments of the present disclosure(s). Although particular features of the disclosed embodiments may be described with reference to one or more particular embodiments and/or drawings, it should be understood that such features are not limited to usage in the one or more particular embodiments or drawings with reference to which they are described, unless expressly specified otherwise. The scope of the disclosure is to be defined by the claims.
(24) Although process (or method) steps may be described or claimed in a particular sequential order, such processes may be configured to work in different orders. In other words, any sequence or order of steps that may be explicitly described or claimed does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order possible. Further, some steps may be performed simultaneously despite being described or implied as occurring non-simultaneously (e.g., because one step is described after the other step). Moreover, the illustration of a process by its depiction in a drawing does not imply that the illustrated process is exclusive of other variations and modifications thereto, does not imply that the illustrated process or any of its steps are necessary to the embodiment(s). In addition, although a process may be described as including a plurality of steps, that does not imply that all or any of the steps are essential or required. Various other embodiments within the scope of the described disclosure(s) include other processes that omit some or all of the described steps. In addition, although a circuit may be described as including a plurality of components, aspects, steps, qualities, characteristics and/or features, that does not indicate that any or all of the plurality are essential or required. Various other embodiments may include other circuit elements or limitations that omit some or all of the described plurality.
(25) Throughout this disclosure, the terms FET is field-effect-transistor; MOS is metal-oxide-semiconductor; MOSFET is MOS FET; PMOS is p-channel MOS; MOS is n-channel MOS; BiCMOS is bipolar CMOS; LSP of a signal is the Least-Significant-Portion of the signal; MSP of the signal is the Most-Significant-Portion of the signal; and the sum of the MSP of the signal plus the LSP of the signals is equal to the whole signal; and the MSP or LSP can be represented in analog or digital form or combination thereof; MSB is Most-Significant-Bit and LSB is Least-Significant-Bit; SPICE is Simulation Program with Integrated Circuit Emphasis which is an industry standard circuit simulation program; micro is μ which is 10.sup.−6; nano is n which is 10.sup.−9; and pico is p which is 10.sup.−12. Bear in mind that V.sub.DD (as a positive power supply) and V.sub.SS (as a negative power supply) are applied to all the circuitries, blocks, or systems in this disclosure, but may not be shown for clarity of illustrations. The V.sub.SS may be connected to a negative power supply or to the ground (zero) potential. Body terminal of MOSFETs can be connected to their respective source terminals or to the MOSFET's respective power supplies, V.sub.DD and V.sub.SS.
(26) Keep in mind that for descriptive clarity, illustrations of the disclosed inventions are simplified, and their improvements beyond simple illustrations would be obvious to one skilled in the arts.
Section 1A—Description of FIG. 1A
(27)
(28)
wherein j loops-up to n times and wherein P.sub.j=|O.sub.j|+O.sub.j/2. The generated P.sub.j are scaled binarily sequentially and added together according to
(29)
(30) In section A1′ and FIG. 1AAA the precision of an approximate squarer's digital IC (arranged asynchronously in accordance with the aSQR method) as a function of n will be described in more detail. Also, section A1′ will describes the flexibility of increasing the precision of the aSQR method by combining it with a conventional multiplier function at the tail end of the interpolation chain.
(31) Some of the benefits of the aSQR method, operating in synchronous mode, are summarized below:
(32) First, the aSQR method enables a digital IC state machine to perform on-the fly or pre-programming of precision versus power consumption, and speed of an approximate squarer. The lower the precision requirement, the faster the squaring and the lower the power consumption per the squaring operation. As such, the precision of squarer approximation can be traded off with cost, speed, and power consumption depending on application cots-performance objectives.
(33) Second, relatively speaking while addition (subtraction) occupies a large area in the digital domain, a digital IC state machine arranged in accordance with the disclosed aSQR method utilizes fewer adders compared to conventional digital IC squarers. Instead, the disclosed aSQR method requires functions such as multiply or divide by two, that can be implemented by a simple shift to the right or left in the digital domain, which takes a small die area. Moreover, the aSQR method utilizes functions such as adding or subtracting a fixed digital value (in proportion to an input digital word's full scale), which also take a relatively small area.
(34) Third, the disclosed digital IC approximate squaring can be arranged to perform approximate multiplication by utilizing the quarter square algorithm. Accordingly, digital IC multiplication can be performed by deducting the square of subtraction of two digital words (x, y) from the square of their summation as in (x+y).sup.2+(x−y).sup.2=4xy.
(35) Fourth, the disclosed digital IC approximate squaring can be arranged to performed square and accumulate (SAC) and multiply and accumulate (MAC) functions in pure digital and or mixed-mode. For example, plurality of digital outputs of approximate digital IC squarers or approximate digital IC multipliers can be inputted to plurality of current mode Digital-to-Analog-Converters (iDAC), wherein the function of summation (e.g., adding two multiplications) can be performed simply by coupling together the current output terminals of plurality of iDACs.
(36) Fifth, there is the option and flexibility to increase the precision of the aSQR method substantially by combining it with a conventional multiplier function at the tail end of the interpolation chain.
Section 1AA—Description of FIG. 1AA
(37)
(38) In
(39) In the first interpolation (n=1), the D.sub.i word is shifted by half of full-scale (FS/2) to arrange a digital word O.sub.1 which is a D.sub.i word that is digitally offset by FS/2. As such, O.sub.1=D.sub.i−FS×2.sup.−1. Then, a maximum of the O.sub.1 word and zero-scale is selected that outputs a P.sub.1 digital word P.sub.1=max (O.sub.1, ZS). Accordingly, the digital P.sub.1 word represents the maximum portion of the D.sub.i word above FS/2.
(40) In the second interpolation (n=2) stage, the D.sub.i word is shifted by a sum of 2×P.sub.1 word and a quarter of full-scale to generate a digital word O.sub.2 which is a D.sub.i word that is offset down by FS×2.sup.−2+2×P.sub.1. That is to say O.sub.2=D.sub.i−(FS×2.sup.−2+2×P.sub.1). Then, a maximum of the digital O.sub.2 word and zero-scale is selected that generates a digital P.sub.2 word which is a positive word with respect to zero-scale or P.sub.2=max (O.sub.2, ZS). Accordingly, the digital P.sub.2 word represents the positive portion of the D.sub.i word above the sum of FS/4 and 2×P.sub.1 word. Here at the second interpolation point, an approximate squared digital word S.sub.2=D.sub.i2.sup.2 (that is an approximate representation of the square of the digital D.sub.i word) is be generated by summing the binarily scaled digital P.sub.1 and P.sub.2 words. Stated mathematically, D.sub.i.sup.2≈S.sub.2=D.sub.i2.sup.2=2.sup.1×P.sub.1+2.sup.0×P.sub.2. As depicted in FIG. 1AAA, notice that with two interpolations (n=2) the S.sub.2=D.sub.i2.sup.2 word is ˜93.6% accurate as compared to ideal D.sub.i.sup.2 (with ˜6.4% error).
(41) If a squarer with greater than 93.6% of precision is required, another interpolation (n=3) can be implemented in accordance to the aSQR method. In the third interpolation stage, the D.sub.i word is shifted by a sum of 2×(P.sub.1+P.sub.2) word and one eighth of full-scale (FS/8) to generate a digital word O.sub.3 which is a D.sub.i word that is offset by FS×2.sup.−3+2×(P.sub.1+P.sub.2). That is to say O.sub.3=D.sub.i−{FS×2.sup.−3+2×(P.sub.1+P.sub.2)}. Then, the maximum of the O.sub.3 word and zero-scale (ZS) is selected which generates a P.sub.3 word that is a positive word with respect to ZS or P.sub.3=max (O.sub.3, ZS). Accordingly, the P.sub.3 word represents the positive portion of the D.sub.i word above the sum of FS/8 and 2×(P.sub.1+P.sub.2) word. Here at the third interpolation point, an approximate squared digital word S.sub.3=D.sub.i3.sup.2 (that is an approximate representation of the square of the D.sub.i word) is be generated by summing the binarily proportioned P.sub.1, P.sub.2, and P.sub.3 words. Stated mathematically, D.sub.i.sup.2≈S.sub.3=D.sub.i3.sup.2=2.sup.1×P.sub.1+2.sup.0×P.sub.2+2.sup.−1×P.sub.3≈D.sub.i3.sup.2. As depicted in FIG. 1AAA, notice that with three interpolations (n=3) the S.sub.3=D.sub.i3.sup.2 word is ˜98.4% accurate as compared to ideal D.sub.i.sup.2 (with ˜1.6% error).
(42) When a squarer with greater than 98.4 of precision is required, another interpolation (n=4) can be implemented in accordance to the aSQR method. In the fourth interpolation stage, the D.sub.i word is shifted by a sum of 2×(P.sub.1+P.sub.2+P.sub.3) word and one sixteenth of full-scale (FS/16) to arrange a digital word O.sub.4 which is a D.sub.i word that is offset by FS×2.sup.−4+2×(P.sub.1+P.sub.2+P.sub.3). Put differently, O.sub.4=D.sub.i−{FS×2.sup.−4+2×(P.sub.1+P.sub.2+P.sub.3)}. Then, a maximum of the O.sub.4 word and zero-scale is selected that generates a P.sub.4 word which is a positive word with respect to ZS or P.sub.4=max (O.sub.4, ZS). Accordingly, the P.sub.4 word represents the positive portion of the D.sub.i word above the sum of 1/16 of FS and 2×(P.sub.1+P.sub.2+P.sub.3) word. Here again at the fourth interpolation point, an approximate squared digital word S.sub.4=D.sub.i4.sup.2 (that is an approximate representation of the square of the D.sub.i word) is be generated by summing binarily proportioned P.sub.1, P.sub.2, P.sub.3, and P.sub.4 words. Stated mathematically, D.sub.i.sup.2≈S.sub.4=D.sub.i4.sup.2=2.sup.1×P.sub.1+2.sup.0×P.sub.2+2.sup.−1×P.sub.3+2.sup.−2×P.sub.4. As depicted in FIG. 1AAA, notice that with four interpolations (n=4) the S.sub.4=D.sub.i4.sup.2 word is ˜99.6% accurate as compared to ideal D.sub.i.sup.2 (with ˜0.4% error).
(43) Again, if an approximate squarer with higher precision than 0.4% (accurate to ˜8-bits) is required, another interpolation (n=5) can be implemented in accordance to the aSQR method. As such, the D.sub.i word can be shifted by a sum of 2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4) words and 1/32 of full-scale (FS/32) to arrange a digital word O.sub.5 which is a D.sub.i word that is offset by FS×2.sup.−5+2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4). Said differently, O.sub.5=D.sub.i−{FS×2.sup.−5+2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4)}. Then, a maximum of the O.sub.5 word and ZS is selected that generates a P.sub.5 word which is a positive word with respect to ZS or P.sub.5=max (O.sub.5, ZS). Accordingly, the P.sub.5 word represents the positive portion of the D.sub.i word above the sum of FS/32 and 2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4) word. Here again, an approximate squared digital word S.sub.5=D.sub.is.sup.2 can be arranged by summing binarily proportioned P.sub.1, P.sub.2, P.sub.3, P.sub.4, and P.sub.5 words, wherein S.sub.5=D.sub.i5.sup.2 word is an approximate representation of the square of the D.sub.i word. Stated mathematically, D.sub.i.sup.2≈S.sub.5=D.sub.i5.sup.2=2.sup.1×P.sub.1+2.sup.0×P.sub.2+2.sup.−1×P.sub.3+2.sup.−2×P.sub.4+2.sup.−3×P.sub.5. As depicted in FIG. 1AAA, with five interpolations (n=5), observe that S.sub.5=D.sub.i5.sup.2 word is ˜99.9% accurate as compared to ideal D.sub.i.sup.2 (with ˜0.1% error).
(44) Similarly, if an approximate squarer with better than ˜0.1% precision (accuracy of ˜10-bits) is needed, another interpolation (n=6) can be implemented in accordance to the aSQR method. As such, the D.sub.i word is shifted by a sum of 2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4+P.sub.5) words and 1/64 of full-scale (FS/64) to arrange a digital word O.sub.6 which is a D.sub.i word that is offset by FS×2.sup.−6+2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4+P.sub.5). Stated differently, O.sub.6=D.sub.i−{FS×2.sup.−6+2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4+P.sub.5)}. Then, a maximum of the O.sub.6 word and ZS is selected that generates a P.sub.6 word which is a positive word with respect to ZS or P.sub.6=max (O.sub.6, ZS). Accordingly, the P.sub.6 word represents the positive portion of the D.sub.i word above the sum of 1/64 of FS and 2×(P.sub.1+P.sub.2+P.sub.3+P.sub.4+P.sub.5) words. Here again at the sixth interpolation point, an approximate squared digital word S.sub.6=D.sub.i6.sup.2 (that is an approximate representation of the square of the D.sub.i word) is be generated by summing binarily proportioned P.sub.1, P.sub.2, P.sub.3, P.sub.4, and P.sub.5 words. Stated mathematically, D.sub.i.sup.2≈S.sub.5=D.sub.i5.sup.2=2.sup.1×P.sub.1+2.sup.0×P.sub.2+2.sup.−1×P.sub.3+2.sup.−2×P.sub.4+2.sup.−3×P.sub.5+2.sup.−4×P.sub.6. As depicted in FIG. 1AAA, with six interpolations (n=6), observe that S.sub.6=D.sub.i6.sup.2 word is ˜99.975% accurate as compared to ideal D.sub.i.sup.2 (with ˜0.025% error). Notice that if an approximate squarer with better than ˜0.025% precision (accuracy of ˜12-bits) is needed, then more interpolation (n >6) can be implemented in accordance to the aSQR method.
(45) There is an option of increasing precision of the aSQR method by combining it with a conventional squarer at the tail end of the interpolation chain and here is how: The precision of D.sub.i approximate squaring can be increase from S.sub.i to S.sub.iA by summing the S.sub.i term to a (O.sub.(i+1)A).sup.2, wherein O.sub.(i+1)A=O.sub.i+1+FS/2.sup.i at the tail-end of interpolation chain. With more interpolation down the interpolative cascade the width of the intermediate digital words gets smaller. Conventional squarers generally occupy substantially smaller area when squaring smaller bit-width (e.g., 2-bit or 3-bit digital squarer can be arranged with much smaller gate count than a 6-bit bit squarer), and a divide by 2 digital function requires a simple bit shift-right in a shift register which is also small. Hence, the digital implementation of (O.sub.(i+1)A).sup.2 can occupy a small gate count down the interpolation chain. Accordingly, the increased gate count area attributed to utilizing a conventional squarer at the tail-end, may be worth the increase in precision of approximate squaring and provide additional cost-performance flexibility in accordance with different application requirements. For example, in
(46) In summary, some of the benefits of utilizing the aSQR method in an approximate asynchronous squarer are as follows:
(47) First, conventional digital squarers require many full adders which occupy large areas, generally speaking. The aSQR method can be implemented in the digital domain with fewer adders (compared to a conventional digital squarer) which makes it more area efficient.
(48) Second, implementing the aSQR method requires a number multiply or divide by two operations which can be implemented inexpensively in the digital domain by a shift right or left operation, respectively.
(49) Third, utilizing the aSQR method having more interpolations, the peak-to-peak digital value of sequential P.sub.i digital words diminish, which can help reduced the overall logic gate-count of its implementation.
(50) Fourth, the aSQR method generates a number of points (i.e., digital words) that exactly fit the square function, and it linearly interpolates in-between those points. The larger the number of interpolations (n), the greater the number of points that exactly fit an ideal square function and thus the less the error associated with linearly interpolating in between those exact fit points.
(51) Fifth, fewer gates in a digital circuit generally go hand-in-hand with lower dynamic power consumption and faster speed. As such, since the aSQR method requires fewer gates for implementing a square function, it can function with higher speed and lower dynamic power consumption compared to a convocational digital IC squarer implementation, for a given resolution.
(52) Sixth, there is the option and flexibility to increase the precision of the aSQR method substantially by combining it with a conventional multiplier function at the tail end of the interpolation chain.
(53) Seventh, the disclosed digital IC approximate squaring can be arranged to perform approximate multiplication by utilizing the quarter square method. Accordingly, digital IC multiplication can be performed by deducting the square of subtraction of two digital words (x, y) from the square of their summation as in (x+y).sup.2+(x−y).sup.2=4xy.
(54) Eight, the disclosed digital IC approximate squaring can be arranged to performed square and accumulate (SAC) and multiply and accumulate (MAC) functions in mixed-mode. For example, plurality of outputs of approximate squarer ICs or approximate multiplier ICs can be inputted to plurality of current mode Digital-to-Analog-Converters (iDACs), wherein by the function of summation (e.g., adding two multiplications) can be performed simply by coupling together the current output terminals of plurality of iDACs.
Section 1AAA—Description of FIG. 1AAA
(55) FIG. 1AAA is a circuit simulation showing the relation between the number of interpolation and the error (i.e., deviation from ideal) attributed to the aSQR method of
(56) The horizontal axis shows the digital input word Di spanning from zero scale (ZS) at zero milli-seconds (ms) to full scale (FS) at 10 ms.
(57) The vertical axis shows the percent (%) of inaccuracy of the squarer approximation (S.sub.2 to S.sub.6) as compared to an ideal square (D.sub.i.sup.2).
(58) Bear in mind that for sake of clarity (e.g., avoid over-lapping graphs), some of the error waveforms in the upper and lower graphs of FIG. 1AAA are added an artificial offset.
(59) The lower part of FIG. 1AAA depicts simulated precision of aSQR method with n=2 interpolation having an error of about 6.4% for S.sub.2−Di.sup.2 (offset by 0.4%), with n=3 interpolation having an error of about 1.6% for S.sub.3−Di.sup.2 (offset by 0.2%), and with n=4 interpolation having an error of about 0.4% for S.sub.4−Di.sup.2.
(60) The upper part of FIG. 1AAA depicts simulated precision of aSQR method with n=5 interpolation having an error of about 0.1% for S.sub.5−Di.sup.2 (offset by 0.01%), and with n=6 interpolation having an error of about 1.6% for S.sub.3−Di.sup.2 (offset by 0.2%).
Section 1B & 1BB—Description of FIGS. 1B & 1BB
(61)
(62) The A and B are 1-bit wide digital input ports, C.sub.i is the carry-in 1-bit port, S.sub.o is the summation output 1-bit port, and C.sub.o is the carry-out 1-bit port.
Section 1C & 1D—Description of FIGS. 1C & 1D
(63)
(64) The a1 through a4 (a1: a4) are the first 4-bit wide input port and b1 to b4 (b1: b4) are the second 4-bit wide input port of the 4-bit wide full adder of
Section 1E & 1F—Description of FIGS. 1E & 1F
(65)
(66) The al through a6 (a1: a6) are the first 6-bit wide input port and b1 to b6 (b1: b6) are the second 6-bit wide input port of the 6-bit wide full adder of
Section 1G & 1H—Description of FIGS. 1G & 1H
(67)
(68) The a1 through a8 (a1: a8) are the first 8-bit wide input port and b1 through b8 (b1: b8) are the second 8-bit wide input port of the 8-bit wide full adder of
Section 2A—Description of FIG. 2A
(69)
(70) Here, the digital input word Di is an 8-bit wide word (D1: D8) where D1 is the Most-Significant-Bit (MSB) and D8 is the Least-Significant-bit (LSB).
(71) In the asynchronous embodiment of aSQR method depicted in
(72) The D1/
(73) The D1/
(74) The D1/
(75) The D2/
(76) The 4-bit full adder 4FA.sub.2a adds the 2-bit wide digital word Z7:Z8 (with proper scaling via programming a1:a2=0) to the 4-bit wide Y5: Y8 digital word. Then, the Q1:Q4 four-bit wide digital output word of 4FA.sub.2a (with proper scaling by programming a1′: a2′=0) is added the 6-bit wide digital word X3:X8 through the 6-bit full adder 6FA.sub.2a. Next, the Q1:Q6 six-bit wide digital output word of 6FA.sub.2a (with proper scaling via arranging a1″: a2″=0) is added the 8-bit wide digital word W2: W8 (with b8=0) through the 8-bit full adder 8FA.sub.2a.
(77) The 8-bit digital output word Q1: Q8 of 8FA.sub.2a represents the equivalent to the S4 digital word of
(78) The SPICE simulation of digital circuit in
(79) It is obvious to one skilled in the art that other combination logic designs can be implemented in accordance with the aSQR method. Moreover, it is known by those skilled in the arts that for asynchronous logic, alternative digital IC embodiments (e.g., flip-flops, clocked latches, etc.) may be utilized to prevent (e.g., adder output, etc.) glitches due to intermediate digital signal rippling through the stages of digital IC logic paths. Also, keep in mind that for clarity of illustration of
(80) The benefits of approximate squarer summarized in sections 1A and 1AA are also applicable here to
Section 3A—Description of FIG. 3A
(81)
(82) The horizontal axis shows the digital input word Di spanning from zero scale (ZS) at zero milli-seconds (ms) to full scale (FS) at 50 μs.
(83) The vertical axis shows the percent (%) of inaccuracy of the asynchronous squarer of
Section 4A—Description of FIG. 4A
(84)
(85) In a digital multiplier IC that is arranged in accordance with the aMULT method, a pair of digital input words x.sub.in and y.sub.in can be approximately multiplied (˜xy.sub.i≈x.sub.in×y.sub.in) through a series of interpolation (n) asynchronously (i.e., without a clock), wherein the accuracy of the ˜xy.sub.i digital word multiplication results can be increased with more interpolations (n). For example, n=2 interpolations generate an approximate multiplication digital word ˜xy.sub.2 whereas n=4 interpolations generate an approximate multiplication digital word ˜xy.sub.4, wherein ˜xy.sub.2 is less precise multiplication result than ˜xy.sub.4. Generally speaking, the less the interpolations (fewer n), the less the precision of the approximate multiplication results. But fewer interpolations can be done faster with less power consumption and less logic gate count (i.e., cheaper IC). This feature of the aMULT method would enable the end application to pre-program and optimize the approximate multiplication function in accordance with the end application cost-performance objectives. As noted earlier, for cost sensitive applications the aMULT method can be utilized synchronously where an approximate multiplication digital IC cell block (e.g., cell 2) is re-used in a digital time-multiplexed loop in a sequence of cycles (i.e., n-times). Conversely, for speed sensitive applications, the aMULT method can be utilized asynchronously (clock-free) through a cascaded series of n interpolations implemented in combinational logic. In the proceeding description, note that the range of the x.sub.in and y.sub.in digital words is from negative full-scale (−FS) which can be all zero-bits to positive full-scale (+FS) which can be all one-bits. Keep in mind that the aMULT method utilizing six interpolations here is for illustrative clarity, and not as a limitation of the aMULT method that can accommodate higher interpolation (n >6) which will result in higher precision.
(86) With the first interpolation (n=1), the x.sub.in and y.sub.in digital input words are inputted to a combinational logic block Cell 1 which generates digital output words Xo.sub.1 and yo.sub.1. The digital output words Xo.sub.1 and yo.sub.1 are functions of the x.sub.in and y.sub.in digital input words, in accordance to the following digital-input to digital-output transfer functions of Cell 1:
(87)
(88) and
(89)
The first approximate digital multiplication in Cell 1 is generated in accordance with the mathematical transfer function
(90)
or
(91)
The Cell 1's first interpolation at i=0 (n=1) and by programming xy.sub.0=0, then
(92)
With n=1,
(93) In the second interpolation (n=2), the pair of Xo.sub.1 and yo.sub.1 digital output words from Cell 1 are fed as digital inputs onto a next digital combinational logic block Cell2.sub.1. The digital input to digital output transfer function of Cell 2 is in accordance with X.sub.o=|x+y|−|x−y| and y.sub.o=y.sub.o=|x+y|−|x−y|−FS, wherein x and y are the inputs and X.sub.o and y.sub.o are the outputs of Cell 2, respectively. Similarly, the approximate digital multiplication for cell 2 is generated in accordance with the mathematical transfer function
(94)
or
(95)
As such for Cell2.sub.1, the second interpolation at i=1, then
(96)
As noted above, bear in mind that Xo.sub.2=|Xo.sub.1+y.sub.o1|−|Xo.sub.1−y.sub.o1|. For n=2,
(97) Also, in the next interpolation n=3, the pair of Xo.sub.2 and yo.sub.2 digital words that were generated by cell 2 are fed as digital inputs onto the next digital combinational logic block Cell2.sub.2. The digital input to digital output transfer function of cell 2 is in accordance with X.sub.o=|x+y|−|x−y| and y.sub.o=y.sub.o=|x+y|−|x−y|−FS, wherein x and y are the inputs and X.sub.o and y.sub.o are the outputs of Cell 2, respectively. Likewise, for Cell2.sub.2 the next interpolation with i=2, then
(98)
As noted above, bear in mind that Xo.sub.3=|Xo.sub.2+y.sub.o2|−|Xo.sub.2−y.sub.o2|. For n=3,
(99) Similarly, in the next interpolation n=4, the pair of Xo.sub.3 and yo.sub.3 digital words that were generated by cell 2 are fed as digital inputs onto the next digital combinational logic block Cell2.sub.3. The digital input to digital output transfer function of cell 2 is also in accordance with X.sub.0=|x+y|−|x−y| and y.sub.o=y.sub.o=|x+y|−|x−y|−FS, wherein x and y are the inputs and X.sub.o and y.sub.o are the outputs of Cell 2, respectively. Similarly, for Cell2.sub.3, the next interpolation with i=3, then
(100)
As noted above, bear in mind that Xo.sub.4=|Xo.sub.3+y.sub.o3|−|Xo.sub.3−y.sub.o3|. For n=4,
(101) Likewise, in the next interpolation n=5, the pair of Xo.sub.4 and yo.sub.4 digital words that were generated by cell 2 are fed as digital inputs onto the next digital combinational logic block Cell2.sub.4. Also, the digital input to digital output transfer function of cell 2 is also in accordance with X.sub.0=|x+y|−|x−y| and y.sub.o=y.sub.o=|x+y|−|x−y|−FS, wherein x and y are the inputs and X.sub.o and y.sub.o are the outputs of Cell 2, respectively. Again, for Cell2.sub.4 the next interpolation with i=4, then
(102)
As noted earlier, Xo.sub.s=|Xo.sub.4+Y.sub.o4|−|Xo.sub.4−y.sub.o4|. For n=5,
(103) Lastly, for
(104)
As noted earlier, Xo.sub.6=|Xo.sub.5+y.sub.o5|−|Xo.sub.5−y.sub.o5|. For n=6,
(105) There is an option and flexibility to increase the precision of the iMULT method by combining it with a conventional multiplier function at the tail end of the interpolation chain and here is how: The precision of x.sub.in×y.sub.in approximate multiplication can be increase by summing the xy.sub.i+1 term to a scaled multiplication term (Xo.sub.i+1×yo.sub.i+1)/2.sup.2i at the tail-end of interpolation chain. With more interpolation down the Cell 2 cascade, the effective weight of X.sub.o and y.sub.o digital words (on the approximate multiplication results in accordance with the iMULT results) get smaller. Conventional multipliers occupy substantially smaller area when multiplying smaller bit-width (e.g., 2-bit by 2-bit multiplier), and a divide by 2 digital function requires a simple bit shift-right in a shift register which is small. Hence, the digital implementation of (Xo.sub.i+1×yo.sub.i+1)/2.sup.2i could take a small gate count. Accordingly, the increased gate count area attributed to utilizing a conventional multiplier at the tail-end, may be worth the increase in precision of approximate multiplication and provide additional cost-performance flexibility in accordance with different application requirements. For example, in
(106) In summary, some of the benefits of an asynchronous digital multiplier unitizing the aMULT method care the following:
(107) First, conventional multipliers require many full adders which occupy large area in the digital domain, generally speaking. The aMULT method can be implemented in the digital domain with fewer adders (compared to a conventional digital squarer) which makes it more area efficient.
(108) Second, utilizing the aMULT method having more interpolations n, the peak-to-peak digital value of sequential xy.sub.n digital words diminish with increasing # of n, which can help reduced the overall logic gate-count of its implementation.
(109) Third, the aMULT method generates a number of points (digital words) that exactly (represent) fit the square function, and linearly interpolates in-between those points. The more the number of interpolation (n), the greater the number of points that exactly fit an ideal multiplication result and thus the less the error associated with linearly interpolating in between those exact fit points.
(110) Fourth, fewer gates in a digital circuit generally go hand-in-hand with lower dynamic power consumption and faster speed. As such, since the aMULT method requires fewer gates for implementing a multiplication function, it can function with higher speed and lower dynamic power consumption compared to convocational digital IC multiplier implementations, for a given resolution.
(111) Fifth, the disclosed digital IC approximate multiplier can be arranged to performed multiply and accumulate (MAC) functions in mixed-mode. For example, plurality of outputs of approximate digital multiplier ICs can be inputted to plurality of current mode Digital-to-Analog-Converters (iDACs), wherein the function of summation (e.g., adding two multiplications) can be performed simply by coupling together the current output terminals of plurality of iDACs.
(112) Sixth, the similarities between Cell 1 and Cell 2, and the similarities between the X.sub.o=|x+y|−|x−y| and the y.sub.o=y.sub.o=|x+y|−|x−y|−FS digital functionality can be taken advantage of and to help share logic circuitry, lower dynamic power consumption, and save on gate count.
(113) Seventh, there is the option and flexibility to increase the precision of the aMULT method substantially by combining it with a conventional multiplier function at the tail end of the interpolation chain.
Section 5A—Description of FIG. 5A
(114)
(115) The pair of digital input words x.sub.in=X and y.sub.in=Y are inputted to Cell 1 generate a respective pair of digital output words X.sub.1 and Y.sub.1 with the following digital-input to digital-output transfer function:
(116)
and
(117)
The pair of digital output words X′=x.sub.i and Y′=y.sub.i are then fed onto a Cell 2 as part of a clocked state machine loop which generates a clocked sequence of digital output words X.sub.i+1 and Y.sub.i+1 wherein the number of clocked steps (i) in the digital step machine is a function of number of objective interpolations (n). The Cell 2 digital input to digital output transfer function is as follows: X.sub.i+1=|x.sub.i+y.sub.i|−|x.sub.i−y.sub.i| and Y.sub.i+1=|x.sub.i+y.sub.i|+|x.sub.i−y.sub.i|−FS. The digital step machine loop continues cycling if i<n, and when i=n, then the state machine stops. While i in the state machine incrementally counts up, the Cell 2 continues generating incremental approximate multiplication results (˜XY.sub.i+1≈˜xy.sub.n≈x.sub.in×y.sub.in) according to the following transfer function of
(118)
or
(119)
with xy.sub.0=0 and i incrementally counts from 0 up to n−1, and wherein the accuracy of approximate multiplication increases by 4 times for every additional interpolation step i.
(120) Although the flow chart of
(121)
and
(122)
for Cell 1 and |x.sub.i+y.sub.i|−|x.sub.i−y.sub.i| for Cell 2).
(123) As indicated in the previous section,
(124) Bear in mind that as explained in section 4A, there is an option for increasing the precision of the aMULT method, also when utilized in a synchronous mode, by combining it with a conventional multiplier at the tail end of the interpolation chain.
(125) Some of the benefits of the aMULT method, operating in a synchronous mode, are summarized below:
(126) First, the aMULT method enables a digital IC state machine to perform on-the fly or pre-programming of precision versus power consumption, and speed of an approximate digital multiplier. The lower the precision requirement, the faster the multiplying and the lower the power consumption per the multiplying operation. As such, the precision of multiplication approximation can be traded off with cost, speed, and power consumption depending on application cots-performance objectives.
(127) Second, the disclosed digital IC approximate multiplication can be arranged for multiply and accumulate (MAC) functions in pure digital and or mixed-mode. For example, plurality of outputs of approximate digital IC multipliers can be inputted to plurality of current mode Digital-to-Analog-Converters (iDAC), wherein the function of summation (e.g., adding two multiplications) can be performed simply by coupling together the current output terminals of plurality of the said iDACs.
(128) Third, the commonalities between Cell 1 and Cell 2, and shared functionalities between the X.sub.i+1=|x.sub.i+y.sub.i|−|x.sub.i−y.sub.i| and the Y.sub.i+1=|x.sub.i+y.sub.i|+|x.sub.i−y.sub.i|−FS digital functionality can be taken advantage of and to help share logic circuitry, lower dynamic power consumption, and save on gate count.
(129) Fourth, there is the option and flexibility to increase the precision of the aMULT method substantially here by combining it with a conventional multiplier function at the tail end of the interpolation chain.
Section 6A—Description of FIG. 6A
(130)
(131) The horizontal axis indicates the pair of digital input words x.sub.tin and y.sub.in that span between −FS to +FS over 10 milli-seconds (ms).
(132) The vertical axis shows the approximate x.sub.in×y.sub.in multiplications results of XY.sub.1 to XY.sub.6 of the aMULT method, as function of n=1 to n=6 number of interpolations. Keep in mind that the approximate multiplications result XY.sub.1 through XY.sub.6 are offset by a fraction of FS from its neighboring XY.sub.n, for clarity of illustration.
Section 7A—Description of FIG. 7A
(133)
(134) The horizontal axis indicates the pair of digital input words x.sub.in and y.sub.in that span between −FS to +FS over 10 milli-seconds (ms).
(135) The vertical axis shows the percent (%) of inaccuracy (eXY.sub.1 to eXY.sub.6) of the aMULT method as compared to an ideal multiplier, as function of n=1 to n=6 number of interpolations.
(136) The aMULT approximate multiplication error as a function of n number of interpolation is: eXY.sub.1=25.6% for n=1, eXY.sub.2=6.4% for n=2, eXY.sub.3=1.6% for n=3, eXY.sub.4=0.4% for n=4, eXY.sub.5=0.1% for n=5, and eXY.sub.6=0.0125% for n=6. Notice that precision of approximate multiplication improves by 2.sup.2=4 times for every +1 incremental interpolation.
Section 8A—Description of FIG. 8A
(137)
(138) In a digital squarer IC that is arranged in accordance with the aSQR′ method, a digital input word (x.sub.in−R)=X.sub.0 can be approximately squared ˜S.sub.n≈(x.sub.in−R).sup.2 through a series of n interpolation asynchronously (without a clock), wherein the accuracy of the ˜S.sub.n digital word square result can be increased with more n interpolations. Keep in mind that R is a digital reference scale for +FS and −FS, which shall be explained shortly.
(139) As an example, two interpolations generate an approximate square digital word ˜S.sub.2 whereas n >2 interpolations generate an approximate square digital word ˜S.sub.n, wherein ˜S.sub.2 is less precise than ˜S.sub.n. Generally speaking, the fewer the interpolations n, the less the precision of the approximate square results. But fewer interpolations can be done faster with less power consumption and less logic gate count (cheaper). This feature of the aSQR′ method would enable the end application to pre-program and optimize the approximate squaring function in accordance with the end application cost-performance objectives. As noted earlier, for cost sensitive applications the aSQR′ method can be utilized synchronously where an approximate squarer digital IC cell block (e.g., cell 2) is re-used in a time-multiplexed loop through a sequence of cycles (i.e., n-times). Conversely, for speed sensitive applications, the aSQR′ method can be utilized asynchronously (clock-free) through a cascaded series of n interpolations implemented in combinational logic.
(140) In the proceeding description, note that the range of the x.sub.in digital word is from negative full-scale (−FS=0) which can be all zero-bits to positive full-scale (+FS=2R) which can be all one-bits. Note that R is a digital reference scale to show that 0×R can be −FS, 1×R can be zero-scale (ZS), and 2×R can be +FS. Keep in mind that the aSQR′ method utilizing six interpolations here is for illustrative simplicity, and not as a limitation of the aSQR′ method that can accommodate higher interpolation (n >6) which will result in higher precision.
(141) With the first interpolation (n=1), the (x.sub.in−R)=X.sub.0 digital input word is inputted to a digital combinational logic block Cell 1 which generates digital output word X.sub.i+1 in accordance to the following input-output transfer functions: X.sub.i+1=|X.sub.i−FS|. The approximate digital squaring in Cell 1 is generated in accordance with the mathematical transfer function
(142)
(wherein S.sub.0=0 and i=0.fwdarw.n−1) or
(143)
As such for Cell 1, in the first interpolation with i=0 and programming S.sub.0=0, C.sub.i+1=a digital constant proportional to full-scale (FS) value, then ˜S.sub.0+1=X.sub.0+1+C.sub.0+1.Math.˜S.sub.1=X.sub.1+C.sub.1. With n=1,
(144) In the second interpolation (n=2), the X.sub.1 digital output word from cell 1 is inputted to a digital combinational logic block Cell 2 which generates a digital output word X.sub.i+1=X.sub.2 in accordance to the following input-output transfer functions: X.sub.i+1=2×|X.sub.i−1.5×FS|+FS. The approximate digital squaring in Cell 2 is generated in also accordance with the mathematical transfer function
(145)
(wherein S.sub.0=0 and i=0.fwdarw.n−1) or
(146)
As such for Cell 2.sub.1, in the second interpolation with i=1 and similarly programming C.sub.i+1=a digital constant proportional to FS, then
(147)
With n=2,
(148) Likewise, in the third interpolation (n=3), the X.sub.2 digital output word from the previous cell 2 is inputted to the next one which generates a digital output word X.sub.i+1=X.sub.3 also in accordance to the following input-output transfer functions: X.sub.i+1=2×|X.sub.i−1.5×FS|+FS. Similarly, for Cell 2.sub.2 in the third interpolation with i=2 and similarly programming C′.sub.i+1 is a digital constant function proportional to FS, then
(149)
With n=3,
(150) Similarly, in the fourth interpolation (n=4), the X.sub.3 digital output word from the previous cell 2 is inputted to the next one which generates a digital output word X.sub.i+1=X.sub.4 also in accordance to the following input-output transfer functions: X.sub.i+1=2×|X.sub.i−1.5×FS|+FS. Again, for Cell 2.sub.3, in the fourth interpolation with i=3 and similarly programming C.sub.i+1=a constant, then
(151)
With n=4,
(152) Also, in the fifth interpolation (n=5), the X.sub.4 digital output word from the previous cell 2 is inputted to the next one which generates a digital output word X.sub.i+1=X.sub.5 also in accordance to the following input-output transfer functions: X.sub.i+1=2×|X.sub.i−1.5×FS|+FS. Also, for Cell 2.sub.4 in the fourth interpolation with i=4 and similarly programming C.sub.i+1=a constant, then
(153)
With n=5,
(154) Again, in the sixth interpolation (n=6), the X.sub.5 digital output word from the previous cell 2 is inputted to the next one which generates a digital output word X.sub.i+1=X.sub.6 also in accordance to the following input-output transfer functions: X.sub.i+1=2×|X.sub.i−1.5×FS|+FS. Similarly, for Cell 2.sub.5 in the fourth interpolation with i=5 and similarly programming C.sub.i+1=a constant, then
(155)
With n=6,
(156) In summary, some of the benefits of an asynchronous digital multiplier unitizing the aSQR′ method care the following:
(157) First, conventional multipliers require many full adders which occupy large area in the digital domain, generally speaking. The aSQR′ method can be implemented in the digital domain with fewer adders (compared to a conventional digital squarer) which makes it more area efficient.
(158) Second, implementing the aSQR′ method requires a number square or divide by two operations which can be implemented inexpensively in the digital domain by a shift right or left operation in for example a shift-register, whose bit-width size requirements decreases with increasing division.
(159) Third, utilizing the aSQR′ method having more interpolations n, the peak-to-peak digital value of sequential ˜S.sub.n digital words diminish with increasing # of n, which can help reduced the overall logic gate-count of its implementation.
(160) Fourth, the aSQR′ method generates a number of points (digital words) that exactly (represent) fit the square function, and linearly interpolates in-between those points. The more the number of interpolation (n), the greater the number of points that exactly fit an ideal squarer result and thus the less the error associated with linear interpolations in between those exact fit points.
(161) Fifth, fewer gates in a digital circuit generally go hand-in-hand with lower dynamic power consumption and faster speed. As such, since the aSQR′ method requires fewer gates for implementing a squarer function, it can function with higher speed and lower dynamic power consumption compared to convocational digital IC multiplier implementations, for a given resolution.
(162) Sixth, the disclosed digital IC approximate squarer can be arranged to perform approximate multiplication by utilizing the quarter square algorithm Accordingly, digital IC multiplication can be performed by deducting the square of subtraction of two digital words (x, y) from the square of their summation as in (x+y).sup.2+(x−y).sup.2=4xy. Also, note that the constant terms (C.sub.i+1) in the approximate squaring (˜S.sub.i+1) gets canceled out in light of the subtraction of the quarter square method, which reduced the logic gate count in utilizing the aSQR′ method within the quarter square algorithm to perform a multiplication function.
(163) Seventh, the disclosed digital IC approximate squarer can be arranged to performed square and accumulate (SAC) and multiply and accumulate (MAC) functions in mixed-mode. For example, plurality of outputs of approximate digital multiplier ICs can be inputted to plurality of current mode Digital-to-Analog-Converters (iDACs), wherein the function of summation (e.g., adding two squarers) can be performed simply by coupling together the current output terminals of plurality of iDACs.
Section 9A—Description of FIG. 9A
(164)
(165) In a looped state machine digital squarer IC that is arranged in accordance with the aSQR′ method, a digital input word (x.sub.in−R)=X.sub.i=X.sub.0 can be approximately squared ˜S.sub.n≈(x.sub.in−R).sup.2 through a looped series of n interpolation asynchronously (without a clock), wherein the accuracy of the ˜S.sub.n digital word square result can be increased with more n interpolations.
(166) The precision (i.e., the degree of squarer approximation) of ˜S.sub.n can be pre-programmed or programmed into the digital IC state machine in real-time by inputting a digital word n that is the number of times (or the interpolations) the aSQR′ method is cycled or time-multiplexed through the digital state machine. Keep in mind that the aSQR′ method utilizing six interpolations in the simulation waveforms depicted in
(167) In the first interpolation (n=1), the (x.sub.in−R)=x.sub.in, digital input word is inputted to a digital Cell 1 which generates digital output word X.sub.i+1 in accordance to the following input-output transfer functions: X.sub.i=|x.sub.in, −FS|. The approximate digital squaring in Cell 1 is generated in accordance with the mathematical transfer function
(168)
(wherein S.sub.0=0 and i=0.fwdarw.n−1) or
(169)
As such for Cell 1, in the first interpolation with i=0 and programming S.sub.0=0, C.sub.i+1=a digital constant proportional to full-scale (FS), then ˜S.sub.0+1=X.sub.0+1+C.sub.0+1 .Math.˜S.sub.1=X.sub.1+C.sub.1.
(170) The Cell 2 digital input to digital output transfer function is as follows: X.sub.i+1=2×|X.sub.i−1.5FS|+FS. Accordingly, the digital step machine loop continues cycling if i<n, and when i=n, then the digital state machine stops. While i in the state machine is incrementing up, the Cell 2 continues generating digital in accordance with the mathematical transfer function
(171)
(wherein S.sub.0=0 and i=0.fwdarw.n−1) or
(172)
and wherein the accuracy of approximate squarer increases by 4 times for every +1 incremental interpolation step i.
(173) As indicated in the previous section,
(174) Similar to the last sections,
(175) Some of the benefits of the aSQR′ method, operating in a synchronous mode, are summarized below:
(176) First, the aSQR′ method enables a digital IC state machine to perform on-the fly or pre-programming of precision versus power consumption, and speed of an approximate digital multiplier. The lower the precision requirement, the faster the squaring and the lower the power consumption per the squaring operation. As such, the precision of squarer approximation can be traded off with cost, speed, and power consumption depending on application cots-performance objectives.
(177) Second, relatively speaking while addition (subtraction) occupies a large area in the digital domain, a digital IC state machine arranged in accordance with the disclosed aSQR′ method utilizes fewer adders compared to a conventional digital IC squarers. Instead, the disclosed aSQR′ method requires functions such as square or divide by two, that can be implemented by a simple shift to the right or left in the digital domain, which takes a small die area. Moreover, the aSQR′ method can utilize functions such as for example full-wave or half-wave rectifications and adding or subtracting a fixed digital value (in proportion to an input digital word's full scale), which can also take a relatively small area.
(178) Third, the disclosed digital IC approximate squarer can be arranged to perform approximate multiplication by utilizing the quarter square algorithm Accordingly, digital IC multiplication can be performed by deducting the square of subtraction of two digital words (x, y) from the square of their summation as in (x+y).sup.2+(x−y).sup.2=4xy. Also, keep in mind that the constant terms (e.g., C.sub.i+1 and C′.sub.i+1) in the approximate squaring (˜S.sub.i+1) gets canceled out in light of the subtraction of the quarter square method, which reduced the logic gate count (otherwise attributed to the constant terms) in utilizing the aSQR′ method within the quarter square algorithm to perform a multiplication function.
(179) Fourth, the disclosed digital IC approximate squarer can be arranged to performed square and accumulate (SAC) and multiply and accumulate (MAC) functions in mixed-mode. For example, plurality of outputs of approximate digital multiplier ICs can be inputted to plurality of current mode Digital-to-Analog-Converters (iDACs), wherein the function of summation (e.g., adding two squarers) can be performed simply by coupling together the current output terminals of plurality of iDACs.
Section 10A—Description of FIG. 10A
(180)
(181) The horizontal axis indicates the digital input word X.sub.i that span between −FS to +FS over 10 milli-seconds (ms).
(182) The vertical axis shows the approximate X.sub.i.sup.2 multiplications results of ˜S.sub.1 to ˜S.sub.6 of the aSQR method, as function of n=1 to n=6 which is the number of interpolations. Keep in mind that the approximate squarer results ˜S1.sub.1 through ˜S.sub.6 are offset by a fraction of FS from its neighboring ˜S.sub.n, for clarity of illustration.
Section 11A—Description of FIG. 11A
(183)
(184) The horizontal axis indicates the digital input word X.sub.i that span between −FS to +FS over 10 milli-seconds (ms).
(185) The vertical axis shows the percent (%) of inaccuracy (eS.sub.1 to eS.sub.6) of the aSQR′ method as compared to an ideal squarer, as function of n=1 to n=6 number of interpolations.
(186) The aSQR′ approximate multiplication errors as a function of n number of interpolation are: eS.sub.1=25.6% for n=1, eS.sub.2=6.4% for n=2, eS.sub.3=1.6% for n=3, eS.sub.4=0.4% for n=4, eS.sub.5=0.1% for n=5, and eS.sub.6=0.0125% for n=6. Notice that precision of approximate squarer improves by 2.sup.2=4 times for every +1 incremental interpolation.