Current mode multiply-accumulate for compute in memory binarized neural networks
10915298 ยท 2021-02-09
Inventors
Cpc classification
H03M1/664
ELECTRICITY
H03M1/687
ELECTRICITY
International classification
Abstract
Methods of performing mixed-signal current-mode multiply-accumulate (MAC) operations for binarized neural networks in an integrated circuit are described in this disclosure. While digital machine learning circuits are fast, scalable, and programmable, they typically require bleeding-edge deep sub-micron manufacturing, consume high currents, and they reside in the cloud, which can exhibit long latency, and not meet private and safety requirements of some applications. Digital machine learning circuits also tend to be pricy given that machine learning digital chips typically require expensive tooling and wafer fabrication associated with advanced bleeding-edge deep sub-micron semiconductor manufacturing. This disclosure utilizes mixed-signal current mode signal processing for machine learning binarized neural networks (BNN), including Compute-In-Memory (CIM), which can enable on-or-near-device machine learning and or on sensor machine learning chips to operate more privately, more securely, with low power and low latency, asynchronously, and be manufacturable on non-advanced standard sub-micron fabrication (with node portability), that are more mature and rugged with lower costs. An example of enabling features of this disclosure is as follows: to save power in an always-on setting, reduce chip costs, process signals asynchronously, and reduce dynamic power consumption. Current mode signal processing is utilized in combination with CIM (to further reduce dynamic power consumption associated with read/write cycles in and out of memory) for bitwise counting of plurality of logic state 1 of plurality of XOR outputs for MAC arithmetic in BNNs.
Claims
1. A method of performing a multiply-accumulate operation for binarized neural networks in an integrated circuit, the method comprising: supplying a regulated current source (I.sub.PSR) having a value of I.sub.PSRv, mirroring and scaling I.sub.PSR onto a plurality of current sources (I.sub.S), each having a value I.sub.Sv that is proportional to I.sub.PSRv; individually gating each I.sub.S in the plurality of I.sub.S current sources, each gating responsive to a logical combination of one of an XOR, and an XNOR of a corresponding pair of digital input signals (x, w) to generate a plurality of corresponding analog-output current sources (I.sub.o), wherein each of the I.sub.o current sources has a value that swings between substantially zero and substantially I.sub.Sv responsive to the logical combination of the one of the XOR, and the XNOR of the corresponding pair of digital input signals x, w; combining the plurality of I.sub.o current sources to generate an analog summation current (I.sub.S.sub.
2. The method of performing a multiply-accumulate operation for binarized neural networks in an integrated circuit of claim 1, the method further comprising: controlling the individually gating each I.sub.S in the plurality of I.sub.S current sources in a respective analog current switch (iSW).
3. The method of performing a multiply-accumulate operation for binarized neural networks in an integrated circuit of claim 1, the method further comprising: controlling the individually gating each I.sub.S in the plurality of I.sub.S current sources via a corresponding analog current switch (iSW) at one of a gate port, a source port, and a drain port of each I.sub.S current source.
4. The method of performing a multiply-accumulate operation for binarized neural networks in an integrated circuit of claim 1, the method further comprising: coupling the at least one of the I.sub.SO to a single current output of a current mode Bias DAC (iDAC).
5. The method of performing a multiply-accumulate operation for binarized neural networks in an integrated circuit of claim 1, the method further comprising: coupling the at least one of the I.sub.SO1, I.sub.SO2 pair to a differential current output of a current mode Bias DAC (iDAC).
6. The method of performing a multiply-accumulate operation for binarized neural networks in an integrated circuit of claim 1, the method further comprising: storing at least one of the pair of digital input signals x, w in a digital memory array comprising at least one of a Latch array, a SRAM array, an EPROM array, and an E.sup.2PROM array.
7. The method of performing multiply-accumulate for binarized neural networks in integrated circuits of claim 1, the method further comprising: controlling the individually gating each I.sub.S in the plurality of I.sub.S current sources to generate each corresponding I.sub.o1 and I.sub.o2 comprising: operating a quiescent current source (I.sub.Q) through a differential pair comprising a first transistor (M.sub.1) and a second transistor (M.sub.2), wherein I.sub.Q has a current source value proportional to the I.sub.S current source value, and wherein each transistor comprises a drain-port, a gate-port, and a source-port; controlling the gate-ports of the M.sub.1 and the M.sub.2 with a pair of first digital signals (w,
8. The method of performing multiply-accumulate for binarized neural networks in an integrated circuit of claim 1, the method further comprising: controlling the individually gating each I.sub.S in the plurality of I.sub.S current sources to generate each corresponding I.sub.o comprising: coupling a first and a second transistor in series to arrange a first serialized composite transistor (iSW.sub.1), wherein the first and the second transistors each function as an analog switch comprising two analog ports and a digital control port, and wherein the iSW.sub.1 functions as a first composite analog switch comprising two analog ports and two digital control ports; coupling a third and a fourth transistor in series to arrange a second serialized composite transistor (iSW.sub.2), wherein the third and the fourth transistors each function as an analog switch comprising two analog ports and a digital control port, and wherein the iSW.sub.2 functions as a second composite analog switch comprising two analog ports and two digital control ports; coupling the two respective analog ports of the iSW.sub.1 and the iSW.sub.2 to arrange the iSW.sub.1 and the iSW.sub.2 in parallel forming a series-parallel composite transistor (iSW.sub.SP) comprising two analog ports and two digital control ports; controlling the first and the third transistors with at least one pair of digital signals w,
9. The method of performing multiply-accumulate for binarized neural networks in integrated circuits of claim 1, the method further comprising: receiving a reference current signal (I.sub.R) into an input port of a power supply desensitization (PSR) circuit; and regulating the I.sub.PSR current source to track the I.sub.R, wherein the I.sub.PSR signal is desensitized from power supply variations.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The subject matter presented herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and illustrations, and in which like reference numerals refer to similar elements, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16) The embodiment disclosed in
DETAILED DESCRIPTION
(17) Numerous embodiments are described in the present application and are presented for illustrative purposes only and is not intended to be exhaustive. The embodiments were chosen and described to explain principles of operation and their practical applications. The present disclosure is not a literal description of all embodiments of the disclosure(s). The described embodiments also are not, and are not intended to be, limiting in any sense. One of ordinary skill in the art will recognize that the disclosed embodiment(s) may be practiced with various modifications and alterations, such as structural, logical, and electrical modifications. For example, the present disclosure is not a listing of features which must necessarily be present in all embodiments. On the contrary, a variety of components are described to illustrate the wide variety of possible embodiments of the present disclosure(s). Although particular features of the disclosed embodiments may be described with reference to one or more particular embodiments and/or drawings, it should be understood that such features are not limited to usage in the one or more particular embodiments or drawings with reference to which they are described, unless expressly specified otherwise. The scope of the disclosure is to be defined by the claims.
(18) Although process (or method) steps may be described or claimed in a particular sequential order, such processes may be configured to work in different orders. In other words, any sequence or order of steps that may be explicitly described or claimed does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order possible. Further, some steps may be performed simultaneously despite being described or implied as occurring non-simultaneously (e.g., because one step is described after the other step). Moreover, the illustration of a process by its depiction in a drawing does not imply that the illustrated process is exclusive of other variations and modifications thereto, does not imply that the illustrated process or any of its steps are necessary to the embodiment(s). In addition, although a process may be described as including a plurality of steps, that does not imply that all or any of the steps are essential or required. Various other embodiments within the scope of the described disclosure(s) include other processes that omit some or all of the described steps. In addition, although a circuit may be described as including a plurality of components, aspects, steps, qualities, characteristics and/or features, that does not indicate that any or all of the plurality are essential or required. Various other embodiments may include other circuit elements or limitations that omit some or all of the described plurality. In U.S. applications, only those claims specifically citing means for or step for should be construed in the manner required under 35 U.S.C. 112(f).
(19) Throughout this disclosure, the terms FET is field-effect-transistor; MOS is metal-oxide-semiconductor; MOSFET is MOS FET; PMOS is p-channel MOS; NMOS is n-channel MOS; BiCMOS is bipolar CMOS; SPICE is Simulation Program with Integrated Circuit Emphasis which is an industry standard circuit simulation program; micro is which is 10.sup.6; nano is n which is 10.sup.9; and pico is p which is 10.sup.12. Bear in mind that V.sub.DD (as a positive power supply) and V.sub.SS (as a negative power supply) are applied to all the circuitries, block, or systems in this disclosure, but may not be shown for clarity of illustrations. The V.sub.SS may be connected to a negative power supply or to the ground (zero) potential. Body terminal of MOSFETs can be connected to their respective source terminals or to the MOSFET's respective power supplies, V.sub.DD and V.sub.SS.
(20) Keep in mind that for descriptive clarity, illustrations of this disclosure are simplified, and their improvements beyond simple illustrations would be obvious to one skilled in the arts. For example, it would be obvious for one skilled in the art that MOSFET current sources can be cascoded for higher output impedance and lower sensitivity to power supply variations, whereas throughout this disclosure current sources are depicted with a single MOSFET for clarity of illustrations. Another example, it would also be obvious to one skilled in the art that a circuit design (such as the ones illustrated in this disclosure) can be arranged with NMOS transistors, and or its complementary version utilizing transistors such as PMOS type.
(21) The illustrated circuit schematics of embodiments described in the proceeding sections have the following benefits which are summarized here to avoid their repetitions in each section for sake of clarity and brevity:
(22) First, mixed-signal current-mode circuit designs in this disclosure are suitable for MAC in BNNs.
(23) Second, plurality of mixed-signal current-mode circuit designs in this disclosure take small silicon die area which makes them cost-effective for MAC in BNNs that may require thousands of such circuits in one chip.
(24) Third, because voltage swings are small in current mode signal processing, the disclosed mixed-signal current-mode circuit designs can enable MAC in BNNs that are fast.
(25) Fourth, also because current mode signal processing can be made fast, the disclosed mixed-signal current-mode circuit designs utilized in MAC in BNNs can provide a choice of trade-off and flexibility between running at moderate speeds and operating with low currents to save on power consumption.
(26) Fifth, in part because the disclosed mixed-signal current-mode circuit designs can be arranged on a silicon die right next to memory (e.g., SRAM, EPROM, E.sup.2PROM, etc.) as in Compute-In-Memory (CIM) MAC in BNNs. Such an arrangement reduces the read/write cycles in an out of memory and thus lowers dynamic power consumption.
(27) Sixth, the disclosed mixed-signal current-mode circuit designs can be clock-free, providing computations for MAC in BNNs to operate asynchronously which minimizes latency delay.
(28) Seventh, the disclosed mixed-signal current-mode circuit designs can be clock-free and capacitor-free for MAC in BNNs which provide an option of not requiring switching capacitors for mixed-mode signal processing. This arrangement avoids the extra cost of capacitors on silicon and lowers the dynamic power consumption attributed to switching the capacitors and the clocking updates.
(29) Eight, performance of the disclosed mixed-signal current-mode circuit designs can be arranged to be independent of resistors and capacitor values and their normal variations in manufacturing. Benefits derived from such independence is passed onto the MAC in BNNs that utilize the disclosed circuits. As such, die yield to perform to specifications can be made mostly independent of passive resistors and or capacitors values and their respective manufacturing variations which could otherwise reduce die yield and increase cost.
(30) Ninth, because voltage swings are small in current mode signal processing, the disclosed mixed-signal current-mode circuit designs here can enable MAC in BNNs to operate with low power supply voltage (V.sub.DD).
(31) Tenth, also because voltage swings are small in current mode signal processing, the disclosed mixed-signal current-mode circuit designs can enable internal analog signal to span between full-scale and zero-scale (e.g., summing node of a MAC or analog input of an Analog-To-Digital-Converter or analog input of a comparator) which enables the full-scale dynamic range of MAC in BNNs to be less restricted by V.sub.DD.
(32) Eleventh, the disclosed mixed-signal current-mode circuit designs for MAC in BNNs that can be manufactured on low-cost standard and conventional Complementary-Metal-Oxide-Semiconductor (CMOS) fabrication which are more mature, readily available, and process node portable, which helps MAC in BNNs ICs with more rugged reliability, and multi-source manufacturing flexibility as well as lower manufacturing cost.
(33) Twelfth, digital addition and digital subtraction occupy larger die area. Because the disclosed circuit designs operate in current mode, the function of addition in current mode simply requires the coupling of output current ports. The function of subtraction in current mode can be arranged via a current mirror. Thus, the disclosed circuit designs utilized in MAC in BNNs can be smaller and lower less.
(34) Thirteenth, digital XOR and XNOR functions are required in BNNs. The present disclosure arranges the XOR and XNOR functions to be performed in mixed-signal current-mode for MAC in BNNs.
(35) Fourteenth, plurality of XOR and or XNOR outputs are required to be accumulated for BNNs. The present disclosure provides digital-input to analog-output current XOR and or XNOR circuit designs suitable for mixed-signal MAC in BNNs. The plurality of output currents of plurality of XOR and or XNOR are couple together to perform the function of addition asynchronously, which reduces latency delays substantially.
(36) Fifteenth, as noted earlier, digital addition and subtraction functions occupy large die areas and can be expensive. The present disclosure eliminates the digital adding function of bitwise count of logic state 1 (or ON state) in BNNs. Instead, the present disclosure performs the bitwise count of logic state 1 (or ON state) in BNNs in analog current-mode. By utilizing digital input to analog current output XOR (iXOR) and or digital input to analog current output XNOR (iXNOR), the current outputs of plurality (e.g., 1000s) of iXOR and or iXNOR are simply coupled together, which performs the summation (counting of logic state 1) in mixed-signal current mode and in an area efficient (low cost) manner.
(37) Sixteenth, the disclosed mixed-signal current-mode circuit designs utilized in MAC in BNNs help reduce inaccuracies attributed to the function of addition that stems from random but normal manufacturing variation (e.g., random transistor mismatches in normal fabrication). In the disclosed mixed-signal current-mode circuit designs, the non-linearity due to the non-systematic random statistical contribution of mismatches of adding the current signals roughly equals to the square root of the sum of the squares of such non-systematic random mismatches (attributed to the plurality of the summing signals). Such benefit of attenuated impact of imperfections (due to random manufacturing variations) on overall accuracy is an inherent advantage of the disclosed designs which can improve manufacturing yield to specifications that is passed on to the MAC in BNNs.
(38) Seventeenth, cascoding current source can help increase output impedance and reduce sensitivity of output currents to power supply variation, but require two cascoded transistors. This disclosure provides the option of utilizing a power supply desensitization circuit for a current source that is not cascoded (e.g., single MOSFET current source) which save on area, considering a large number (e.g., 10s of 1000s) of iXOR (and or iXNOR) that may be required in MAC in BNNs.
(39) Eighteenth, because each unit of cumulative current (that represents the bitwise count of logic state 1 at the output of each iXOR and or iXNOR) are equal to one another, the incremental summation of plurality of output currents is thermometer-like. Accordingly, the disclosed mixed-signal current-mode circuit designs utilized in MAC for BNNs provides monotonic incremental accumulation of adding current signals which is beneficial in converging on minimum cost function during training of BNNs.
(40) Nineteenth, the disclosed mixed-signal current-mode circuit designs utilized in MAC for BNNs enables a meaningful portion of the computation circuitry to shut itself off (i.e. smart self-power-down) in the face of no incoming signal so that the computation circuits can remain always on while consuming low stand-by current consumption.
Section 1Description of FIG. 1
(41)
(42) The XOR (xw) of U1.sub.1 controls analog switches N2.sub.1 and N3.sub.1 which enable or disable current mirror N1.sub.1, N4.sub.1 and thus control the value of I.sub.O to swing to either I1.sub.1 value (analog current equivalent to logic 1) or zero (analog current equivalent to logic 0).
(43) MAC for BNNs can be arranged to receive plurality of x, w digital bits inputted to plurality of XOR to generate plurality of I.sub.O currents that can be summed to generate I.sub.OS (i.e., utilizing current mode summation for bitwise counting of plurality of logic state 1 of plurality of iXOR outputs for MAC in BNNs).
(44) In this disclosure, unless otherwise specified, I1 value is the analog current equivalent to logic 1 and zero current is the analog current equivalent to logic 0.
Section 2Description of FIG. 2
(45)
(46) The XOR (xw) of U1.sub.2 and inverter U2.sub.2 controls analog switches N3.sub.2 and N4.sub.2 which steer the N2.sub.2 current (that is mirrored and scaled from N1.sub.2) to flow through either N3.sub.2 to form the I.sub.O1 current (swinging to either zero or I1.sub.2) or flow through N4.sub.2 to form the I.sub.O2 current (swinging to either I1.sub.1 or zero).
(47) Similar to prior section, utilizing the disclosed embodiment of
Section 3Description of FIG. 3
(48)
(49) The XOR (xw) of U1.sub.3 controls the analog switches N4.sub.3 which enable or disable current mirror N1.sub.3, N2.sub.3 and thus controlling the value of I.sub.O to swing to either I1.sub.3 value (analog current equivalent to logic 1) or zero (analog current equivalent to logic 0). Notice that N4.sub.3 and N3.sub.3 are arranged with the same size for current mirror N1.sub.3, N2.sub.3 matching.
(50) Like preceding sections, utilizing the embodiment of
Section 4Description of FIG. 4
(51)
(52) Analog switches arranged by N5.sub.4 in series with N6.sub.4 are controlled by x and w digital signals, respectively. Also, analog switches are arranged by placing N7.sub.4 in series with N8.sub.4 which are controlled by
(53) As such, the disclosed digital-input analog-output current XNOR (iXOR) function as an analog XNOR (whose output effectively controls analog current switches). The analog output current here is controlled by a meshed composite of series-parallel analog current switch iSW.sub.SP comprising of four transistors which can be meaningfully area efficient.
(54) Similarly, utilizing the embodiment illustrated in
Section 5Description of FIG. 5
(55)
(56) An iSW.sub.SP1 comprising a series combination of analog switches N3.sub.5 and N4.sub.5 are placed in parallel with another series combination of analog switches N5.sub.5 and N6.sub.5. When digital bits x, w are both HIGH (logic state 1) or both LOW (logic state 0), then either N3.sub.5 and N4.sub.5 or N5.sub.5 and N6.sub.5 connect the gate-drain port of N1.sub.5 to the gate port of N2.sub.5 thus causing the operating current I.sub.O of N2.sub.5 to mirror and scale that of the operating current I1.sub.5 of N1.sub.5.
(57) Note that concurrently, an iSW.sub.SP2 comprising a series combination of analog switches N7.sub.5 and N8.sub.5 are placed in parallel with another series combination of analog switches N9.sub.5 and N10.sub.5. When digital bits x, w are both HIGH or both LOW, then either series combination of N7.sub.5 and N8.sub.5 and or series combination of N9.sub.5 and N10.sub.5 remain open (i.e., composite switch is off). Conversely, when x, w are in any state other than both HIGH or both LOW, then either series combination of N7.sub.5 and N8.sub.5 and or series combination of N9.sub.5 and N10.sub.5 turn on and short the gate-port voltage of N2.sub.5 to V.sub.SS which keeps the I.sub.O of N2.sub.5 to zero.
(58) Also, keep in mind that if digital bits x, w are in any state other than both HIGH or both LOW, then either series combination of N3.sub.5 and N4.sub.5 and or series combination of N5.sub.5 and N6.sub.5 turn off and isolate the gate-drain port of N1.sub.5 from the grounded gate port of N2.sub.5.
(59) Similarly, utilizing the embodiment illustrated in
Section 6Description of FIG. 6
(60)
(61) The disclosed iXNOR of
(62) Like prior sections, utilizing the embodiment illustrated in
Section 7Description of FIG. 7
(63)
(64) Analog switches arranged by N5.sub.7 in series with N6.sub.7 are controlled by
(65) Similar to the functional operation of a digital XNOR, when x and w are both HIGH (logic 1), then the series analog switches N3.sub.7 and N4.sub.7 are both ON which enables N1.sub.7 to scale and mirror its current I1.sub.7 onto drain port of N2.sub.7 (and through iSW.sub.SP) to generate I.sub.O. Similarly, when digital bits x and w are both LOW (logic 0), then the series analog switches N5.sub.7 and N6.sub.7 are both ON which again enables N1.sub.7 to scale and mirror its current I1.sub.7 onto drain port of N2.sub.7 (and through iSW.sub.SP) to generate I.sub.O.
(66) As noted in the previous sections, utilizing the embodiment illustrated in
Section 8Description of FIG. 8
(67)
(68) The disclosed iXNOR of
(69) Like prior sections, utilizing the embodiment illustrated in
Section 9Description of FIG. 9
(70)
(71) Single-ended output currents of plurality of digital input XOR to analog output currents are accumulated (I.sub.SO) and added to a single bias current I.sub.B generated by a Bias iDAC. A single-ended input current iADC digitalizes the net I.sub.SO+I.sub.B.
(72) The embodiment disclosed in
(73) The PSR circuit of
(74) Be mindful that the DC voltage of the summing node, where the summation current I.sub.SO+I.sub.B flow through, can be arranged to be (an equivalent of a diode connected) V.sub.GS of a PMOSFET that can be arranged to track the diode connected V.sub.GS of P3.sub.9.
(75) Accordingly, the single-ended sum of I.sub.SO+I.sub.B currents flowing through the equivalent of a diode connected V.sub.GS of a PMOSFET as the input of a single-ended current mode ADC (iADC) or a single-ended current mode comparator (iCOMP) can be regulated by the PSR circuit to follow the constant current reference proportional to I1.sub.9.
(76) Thus, the disclosed embodiments provides the option of desensitization I.sub.SO, I.sub.B currents from power supply variations with a single transistor current source (e.g., N2.sub.9 and N7.sub.9) instead of a cascoded current source, which can save substantial die area considering the plurality (e.g., 1000s) of iXORs may be required for a typical MAC in BNNs.
(77) It can be noticed that in
(78) Utilizing the disclosed embodiment illustrated in
(79) Keep in mind that the disclosed embodiment illustrated in
Section 10Description of FIG. 10
(80)
(81)
(82) Each latch cell that is inputted with the digital weight signals (w) and digital row select weight write signals (e.g., row a digital write signals Wwa &
(83) To lower dynamic power consumption associated with reading/writing digital weight data in-and-out of memory, the weight data can be stored in a respective latch, wherein each respective latch cell on the silicon die is not only laid-out right next to its respective iXNOR cell but also the training data-set is pre-loaded and latched onto the respective iXNOR (e.g., N5.sub.10 through N9.sub.10) such as the one described and illustrated in section 7 and
(84) Training data-set is loaded onto respective array of latch cells one row at a time via write control signals (e.g., Wwa &
(85) Accordingly, the respective digital outputs of the latch array (laid-out on the silicon die right next to their respective iXNOR array) receive their respective digital weight data-set (e.g., w.sub.1a, w.sub.2a) into the respective weight digital ports of iXNOR arrays (e.g., gate ports of N7.sub.10, N9.sub.10).
(86) The signal data-set (e.g., x1 &
(87) The outputs of plurality of iXNORs (along a latch array row or a latch array column, depending on the system architecture and software specifications) can be coupled together to generate plurality of summation currents (e.g., I.sub.SOa, I.sub.SOb).
(88) As such, utilizing the embodiment illustrated in
Section 11Description of FIG. 11
(89)
(90) The embodiment disclosed in
(91) Plurality of circuits similar to that of
(92) The digital weight (w.sub.i) data-set is stored in the SRAM array while the respective x.sub.i digital signals are inputted to the iMAC in BNNs.
(93) Consequently, a plurality of rows (e.g., rows a and b) of single-ended sums of analog-output currents (e.g., I.sub.SOa and I.sub.SOb) are generated that represent the analog equivalent of the digital sum of the plurality of respective
(94)
(95) Each SRAM cell that is inputted with the digital weight signals (e.g., w1a &
(96) To lower dynamic power consumption associated with reading/writing digital weight data in-and-out of memory, the weight data can be stored in its respective SRAM cell, wherein each respective SRAM cell on the silicon die is not only laid-out right next to its respective iXNOR cell but also the training data-set is locked onto the respective iXNOR (e.g., N5.sub.11 through N9.sub.11) similar to the one described in section 7
(97) Training data-set is loaded onto respective array of SRAM cells one row at a time via write control signal (e.g., Wwa) which control the row of SRAM array input switches (e.g., N1.sub.11N2.sub.11). Once the said digital weights data-set (e.g., w1a &
(98) The signal data-set (e.g., x1 &
(99) Accordingly, the outputs of plurality of iXNORs (along a SRAM array row or a SRAM array column, depending on the system and software requirements) can be coupled together to generate plurality of summation currents (e.g., I.sub.SOa, I.sub.SOb).
(100) In summary, utilizing the embodiment illustrated in
Section 12Description of FIG. 12
(101)
(102) Here, differential output currents of plurality of analog output currents of plurality of iXOR are accumulated differentially (dI.sub.SO=I.sub.SO1I.sub.SO2) and added to a differential bias current dI.sub.B, wherein dI.sub.B is generated by a differential Bias iDAC. A differential current-input comparator (diCOM) generates the sign of the net result of dI.sub.SO+dI.sub.B.
(103) The embodiment disclosed in
(104) The PSR section of the circuit of
(105) Be mindful that the DC voltage of the summing node, where the summation differential current dI.sub.SO+dI.sub.B flow through, can be arranged to be (as an equivalent of a diode connected pair of) V.sub.GS of a PMOSFET that can be arranged to track the (diode connected) V.sub.GS of P3.sub.9. Accordingly, the differential sum of I.sub.SO+I.sub.D currents flowing through the equivalent of a pair of diode connected V.sub.GS of a PMOSFET (as the differential input of a current mode ADC or current mode comparator diCOMP.sub.12) can be regulated by the PSR circuit to follow the constant current reference proportional to I1.sub.12. The disclosed embodiments provides the option of desensitization of I.sub.SO and I.sub.B currents from power supply variations with a single transistor current source (e.g., N5.sub.12) per each iXOR instead of a cascoded current source, which can save substantial die area considering the plurality (e.g., 1000s) of iXORs may be required for a typical MAC in BNNs.
(106) It can be noticed that in
(107) As an example, the mixed-mode differential iXOR of
(108) When w.sub.1=0, then I.sub.1 flows through N6.sub.12 (while N7.sub.12 is starved which cuts off any operating current from flowing through both N10.sub.12 and N11.sub.12). With w.sub.1=0, If x.sub.1=0, then I.sub.1 that flows through N6.sub.12 is passed on to flow through N8.sub.12 and onto positive port of diCOMP. Also, with w.sub.1=0, if x.sub.1=1, then I.sub.1 that flows through N6.sub.12 is passed on to flow through N9.sub.12 and onto the negative port of diCOMP.
(109) When w.sub.1=1, then I.sub.1 flows through N7.sub.12 (while N6.sub.12 is starved which cuts off both N8.sub.12 and N9.sub.12). With w.sub.1=1, If x.sub.1=0, then I.sub.1 that flows through N7.sub.12 is passed on to flow through N11.sub.12 and onto negative port of diCOMP. Also, with w.sub.1=1, if x.sub.1=1, then I.sub.1 that flows through N7.sub.12 is passed on to flow through N10.sub.12 and onto the positive port of diCOMP.
(110) In summary, utilizing the disclosed embodiment illustrated in
(111) Keep in mind that the disclosed embodiment illustrated in
Section 13Description of FIG. 13
(112)
(113) The mixed-mode differential iXOR of
(114) When w=0, then I.sub.1 flows through N3.sub.13 parallel pair, while N4.sub.13 is starved which cuts off both N7.sub.13 and N8.sub.13 from current. With w=0, if x=0, then I.sub.1 is passed on to flow through N5.sub.13 and onto I.sub.O2 current port. Also, with w=0, if x=1, then I.sub.1 flows through N6.sub.13 and onto I.sub.O1 current port.
(115) When w=1, then I.sub.1 flows through N4.sub.13 parallel pair, while N3.sub.13 is starved which cuts off both N5.sub.13 and N6.sub.13 from current. With w=1, if x=0, then I.sub.1 is passed on to flow through N7.sub.13 and onto I.sub.O1 current port. Also, with w=1, if x=1, then I.sub.1 flows through N8.sub.13 and onto I.sub.O2 current port.
(116) As discussed in section 12 and illustrated in
Section 14Description of FIG. 14
(117)
(118) Plurality of mixed-mode differential iXNORs or iXORs (with digital input to differential analog output currents) can be utilized here in a SRAM memory array with CIM. The digital data that is stored in the SRAM array such as weights (w.sub.i) along with array of respective x.sub.i digital signals are inputted to the differential iMAC for BNN. As a result, plurality of rows (e.g., rows a and b) of differential sums of analog-output differential currents (e.g., dI.sub.SOa=I.sub.SO1aI.sub.SO2a and dI.sub.SOb=I.sub.SO1bI.sub.SO2b) are generated that represent the analog equivalent of the digital sum of the plurality of respective x.sub.iw.sub.i and or
(119)
(120) Each SRAM cell that is inputted with the digital weight signals (e.g., w1a &
(121) To lower dynamic power consumption associated with reading/writing digital weight data in-and-out of memory, the weight data can be stored in its respective SRAM, wherein each respective SRAM cell on the silicon die is not only laid-out right next to its respective differential iXNOR or iXOR cell but also the training data-set is pre-loaded and locked onto the respective differential iXNOR or iXOR (e.g., N9.sub.14 through N11.sub.14).
(122) Training data-set is loaded onto respective array of SRAM cells one row at a time via write control signal (e.g., Wwa) which control the row of SRAM array input switches (e.g., N1.sub.14-N2.sub.14). Once the said digital weights data-set (e.g., w1a &
(123) The signal data-set (e.g., x1 &
(124) The outputs of plurality of differential iXNORs or iXOR (along a SRAM array row or a SRAM array column, depending on the system and software requirements) can be coupled together to generate plurality of summation currents (e.g., I.sub.SOa, I.sub.SOb).
(125) As such, utilizing the embodiment illustrated in