Low area multiply and accumulate unit
11544037 · 2023-01-03
Assignee
Inventors
Cpc classification
G11C7/1006
PHYSICS
G11C11/16
PHYSICS
International classification
Abstract
An improved electronic mixed mode multiplier and accumulate circuit for artificial intelligence and computing system applications that perform vector-vector, vector-matrix and other multiply-accumulate computations. The circuit is provided is a high resolution, high linearity, low area, low power multiply—accumulate (MAC) unit to interface with a memory device for storing computation output results. The MAC unit uses a less number of current carrying elements resulting in much lower integrated circuit area, and provides a tight matching between the current elements thus preserving inherent linearity requirements due to current mode operation. Further the MAC performs current scaling using switches and current division where the current switches occupy minimum size transistors requiring a small area to implement that renders it compatible with MRAM such as a magnetic tunnel junction device. The MAC is hierarchically extended for increased number of bits to provide a delay implementation using orthogonal vector and current addition.
Claims
1. An electronic device comprising: a plurality of current carrying elements, each current carrying element of a uniform size to each provide a matched current output; one or more current splitters, a current splitter coupled to one current carrying element of said plurality for receiving a matched current output from its coupled current carrying element, each current splitter comprising a series of weighted switching transistor structures, each respective weighted switching transistor of the series configured for dividing current received from its coupled current carrying element according to a weighting scheme, a respective weighted switching transistor of the series further receiving a respective bit of an input digital word to be multiplied, the respective bit of the input digital word controlling a current flow at a respective path of that respective weighted switching transistor from an input current carrying element; and an output conductor connected to an output of each weighted switching transistor of the series for accumulating each current flow output of a respective path of the series of weighted switching transistors of the current splitter as controlled by the respective input digital word bits, the accumulated current flow output representing an analog current representation of a multiply and accumulate result.
2. The electronic device of claim 1, further comprising; a non-volatile memory storage element or volatile memory storage element coupled to the output conductor for receiving and storing said analog current representation of the multiply and accumulate result.
3. The electronic device of claim 2, wherein said memory storage element is one of an MRAM, PCM or ReRAM memory storage cell.
4. The electronic device of claim 1, wherein the weighting scheme is a binary weighting scheme such that each successive switching transistor of the current splitter is sized to provide divide input current according to 2.sup.k, where k is a whole number representing a bit position of the input digital word being multiplied.
5. The electronic device of claim 1, wherein the weighting scheme is a different radix other than binary weighting.
6. The electronic device of claim 1, comprising first and second current splitters coupled with respective first and second matched current carrying elements, the first current splitter having a first series of weighted switching transistor structures receiving a first input digital word to be multiplied and the second current splitter having a second series of weighted switching transistor structures receiving a second input digital word to be multiplied, each respective weighted switching transistor of the first series configured for dividing current received from its coupled current carrying element according to the weighting scheme to provide a first multiplication path current flow output of said first current splitter, and each respective weighted switching transistor of the second series configured for dividing current received from its coupled current carrying element according to the weighting scheme to provide a second multiplication path current flow output of said second current splitter, and the output conductor connected to sum the first multiplication path current flow output and said second multiplication path current flow output of each respective first and second current splitter.
7. The electronic device of claim 6, wherein said first input digital word to be multiplied and said second input digital word to be multiplied comprise are of a same bit resolution or different bit resolutions.
8. The electronic device of claim 4, wherein the input digital word is programmed to receive a N-bit resolution digital word, said electronic device configured with a quantity 2.sup.U matched current carrying elements, wherein a number of weighted switching transistor structures in a series of a current splitter includes a number K of binary weighted switches, where U+K=N.
9. The electronic device of claim 8, wherein K<N, said electronic device further comprising: a control device receiving a quantity N−K excess digital bits of said received input digital word, said controller responding to said excess digital bits for selecting and activating additional one or more said 2.sup.U matched current carrying elements.
10. The electronic device of claim 9, further comprising a switch device associated with each said 2.sup.U matched current carrying elements, an associated switch device receiving a control signal for activating or de-activating one or more said 2.sup.U matched current carrying elements responsive to said excess digital bits.
11. An electronic device comprising: a plurality of current carrying elements, each current carrying element of a uniform size to each provide a matched current output; a first current splitter and a second current splitter each first and second current splitter coupled with respective first matched current carrying element and a second matched current carrying element, the first and second current splitters comprising a first hierarchy current scaling network, the first current splitter having a first series of weighted switching transistor structures receiving a first input digital word to be multiplied and the second current splitter having a second series of weighted switching transistor structures receiving a second input digital word to be multiplied, each respective weighted switching transistor of the first series configured for dividing current received from its coupled current carrying element according to a weighting scheme to provide a first multiplication path current flow output of said first current splitter, and each respective weighted switching transistor of the second series configured for dividing current received from its coupled current carrying element according to the weighting scheme to provide a second multiplication path current flow output of said second current splitter; a first output conductor receiving the first multiplication path current flow output; a second output conductor receiving the second multiplication path current flow output; and a second hierarchy current scaling network connected to the first hierarchy current scaling network, said second hierarchy current scaling network comprising: third and fourth current splitters, the third current splitter coupled with said first output conductor, and said fourth current splitter coupled with said second output conductor.
12. The electronic device of claim 11, wherein the third current splitter of said second hierarchy current scaling network comprises: a third first series of weighted switching transistor structures receiving a third input digital word to be multiplied, and the fourth current splitter of said second hierarchy current scaling network comprises: a fourth first series of weighted switching transistor structures receiving a third input digital word to be multiplied, wherein said first multiplication path current flow output at said first output conductor is further divided according to the received third input digital input word, and said second multiplication path current flow output at said second output conductor is further divided according to the received fourth input digital word.
13. The electronic device of claim 12, further comprising: a further output conductor, said further output conductor providing a summed analog output current comprising: a third multiplication path current flow output of said third current splitter resulting from a further dividing of said first multiplication path current flow output according to the received third input digital word, and a fourth multiplication path current flow output of said fourth current splitter resulting from a further dividing of said second multiplication path current flow output according to the received fourth digital input word; and a memory storage element coupled to the further output conductor for receiving and storing said summed analog output current.
14. The electronic device of claim 13, wherein the weighting scheme is one of: a binary weighting scheme such that each successive weighted switching transistor of a respective first current splitter, second current splitter, third current splitter and fourth current splitter is sized to provide divide input current according to 2.sup.k, where k is a whole number representing a bit position of a respective first, second, third, and fourth input digital word being multiplied; or is a different radix weighting scheme that is other than binary weighting.
15. The electronic device of claim 13, wherein the third input digital word at said third current splitter is of a same or different bit resolution as said first input digital word at said first current splitter; and said fourth input digital word at said fourth current splitter is of a same or different bit resolution as said second input digital word at said second current splitter.
16. The electronic device of claim 11, wherein the second input digital word to said second current splitter of said first hierarchy current scaling network is a delayed version of said first input digital word to said first current splitter of said first hierarchy current scaling network, wherein the first input digital word and second input digital word is obtained from a data sampled according to a clock frequency, said first input digital word comprising a sampled sequence at a rate according to a fundamental clock frequency and said second input digital word comprising a sampled sequence delayed by a quarter phase relative to said fundamental clock frequency.
17. The electronic device of claim 15, wherein the first input digital word and second input digital word at respective first and second current splitters of said first hierarchy current scaling network represent a first vector quantity to be multiplied; and the third input digital word and fourth input digital word at respective third and fourth current splitters of said second hierarchy current scaling network represent a second vector quantity to be multiplied, wherein the summed analog output current at said further output conductor comprising an analog current representation of a multiplication result of said first vector and second vector quantities.
18. The electronic device of claim 17, wherein the plurality of current carrying elements are configured to provide a respective weight value for use in a multiplication of said first vector and second vector quantities.
19. The electronic device of claim 17, wherein said first and second current splitters receiving respective first input digital word and second input digital word at said first hierarchy current scaling network receive current from a first subset of a first total amount of first current carrying elements sourcing current to said first and second current splitters, and said third and fourth current splitters receiving said third input digital word and fourth input digital word at said second hierarchy current scaling network receive current from a second subset of a second total amount of second current carrying elements sourcing current to said third and fourth current splitters, and a ratio comprising a number of said second subset of said second amount of current carrying elements over a number of said first subset of said first amount of current carrying elements represent a phase shift angle.
20. The electronic device of claim 19, wherein a ratio of a number of said first subset of first current carrying elements over a number of said first total amount of first current carrying elements provide a first coefficient value for a trigonometric sin θ vector represented by first and second input digital words; and a ratio of a number of said second subset of second current carrying elements over a number of said second total amount of second current carrying elements provide a second coefficient value for a trigonometric cosine θ vector represented by third and fourth input digital words.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The details of the present disclosure, both as to its structure and operation, can be understood by referring to the accompanying drawings, in which like reference numbers and designations refer to like elements.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION
(10) The present disclosure relates to improvements in MAC circuits for deep learning/AI applications, and particularly multiplication engines to perform a multiply operation using data such as received from a memory element and generating a multiplication/accumulation output based on a received memory element output and a received signal and an accumulation. The array also includes an accumulation engine to sum multiplication outputs from the number of multiplication engines.
(11)
(12) In particular, the current-steering DAC circuit architecture 100 shown in
(13) Each identical current source 120 of the array 200 is equally unary-weighted, i.e., providing a unit current 104 of a scale of “1” as shown in
(14) As shown in
(15) In the embodiment shown in
(16) In an embodiment, for MAC operations, a respective current splitter 125 may be associated with a different digital word to be multiplied, e.g., a first current splitter 125 connected to a first unary-weighted current source 120 is configured to receive bits of a first digital word, and a second current splitter 125 connected to a second unary-weighted current source is configured to receive bits of a second digital word, etc. Each first digital word and second digital word can be of the same bit resolution (e.g., each 4 bits or 8 bits) or be of different bit resolutions.
(17) In an embodiment, each set of parallel connected differential-pair switching transistors 126 of a current splitter 125 includes MOS transistors (e.g., PMOS transistors) of a minimal size, e.g., on the order of 100 times smaller than a size of a unit current source transistor, where each respective differential-pair transistors receive differential input bits to a current division of a unitary weighted current 104, i.e., a unit current value, provided by its connected unary-weighted current source.
(18) In an alternate embodiment, the DAC/MAC unit 100 of
(19) In the embodiment depicted in
(20) In embodiments, each parallel connected set 125 of differential-pair transistors 126 including a first weighted differential-pair switching transistors 126 can be radix weighted, e.g., having a weighting scale in between unary and binary, e.g., 1.5.
(21) Generally, in the embodiment depicted in
(22) For example, in the case of a binary DAC having plural (=2.sup.(U)) equal sized current sources, a starting switching transistor of a current splitter branch (e.g., a branch N+1) can be equal to twice the size of the largest switch of its adjacent prior current splitter branch (e.g., a branch N). For a DAC with different radix system, the multiplier is simply given by the radix. As an example, for a radix=1.5 system on the first set of current splitters (each current source is 1.5 times the weight of a previous one, i.e., each successive source weighted according to 1.5.sup.0, 1.5.sup.1, 1.5.sup.2, . . . etc.), and the individual segments can be binary weighted. For the binary system, i.e., the weights of the current sources become 2.sup.0, 2.sup.1, 2.sup.2 . . . etc.
(23) In an example, to implement an 8 bit DAC/MAC, i.e., N=8 bit DAC resolution, a current division according to 256 levels is realized with K=6. That is, with U=2, unit 100 is configured with four (2.sup.(U)=2.sup.2=4) equal sized current sources 120 and respective four (4) current splitters each, current splitter having six (6) binary weighted switches to implement a total of 64 current levels (2.sup.6) at each of the four current splitters (i.e., U+K=N). This means the last received input bit b.sub.5 (small k=5) and of a respective splitter will be received by binary weighted differential-pair switching transistors scaled to draw 32 times more current weight (2.sup.(6−1)=2.sup.5=32) than the weighted current drawn from differential-pair switching transistors receiving bit b.sub.0 (scaled to unary weight=1). Similarly, for N=8 (bit DAC), if U=1, then K=7 (i.e., U+K=8), and unit 100 is configured with two equal sized current sources 120 with each of the two current splitters of binary weighted switches implementing 128 levels (=2.sup.7). In this embodiment, the last received input bit b.sub.6 (small k=6) of a respective splitter will be received by binary weighted differential-pair switching transistors scaled to draw 64 times more current weight than the weighted current drawn from differential-pair switching transistors receiving bit b.sub.0 (i.e., 2.sup.(7−1)=2.sup.6=64) at each of the two current splitters.
(24) Further, in this example, to implement an 8 bit DAC/MAC, with four current sources 120 (U=2, K=6), as each current splitter 125 will receive six (6) digital input bits, i.e., bits b.sub.0, b.sub.1, . . . , b.sub.5 (small k=5), then for an 8-bit device resolution, the two remaining MSB bits of the 8 bit digital input word are used to select one or more of the four (4) current splitters, i.e., 4*2.sup.6=2.sup.2*2.sup.6=2.sup.P*2.sup.K=256 total current levels, where P+K=8, where “P” is the number of bits for selecting the current splitter.
(25) In the example 8 bit DAC/MAC implementation where N=8, U=2, K=6, the two most significant bits MSBs of the 8-bit digital word are used to select one or more of the four current source units depending upon the values. For example, MSB values ‘00’ of the 8 bit digital input signal are used to select only 1 current source and its corresponding splitter will receive the six (6) digital input bits, i.e., bits b.sub.0, b.sub.1, . . . , b.sub.5; MSB values ‘10’ of the 8 bit digital input signal will select two of the four current sources and the six (6) digital input bits are received at each the two attached current splitters; MSB values ‘01’ will select three current sources and the six (6) digital input bits are received at each of the three attached current splitters; and for MSB values ‘11’ all four current sources are selected to each receive the six (6) digital input bits. To select the one or more of the four (4) current sources, the remaining two most significant bits (MSBs) of the 8 bit digital input signal are input to a controller, and remaining 6 LSB bits are applied to the current switching devices according to binary logic.
(26) Similarly, in the example 8 bit DAC/MAC implementation where N=8, U=3 there are eight current sources and K=5 such that a respective current splitter would receive bits 5 bits b.sub.0, b.sub.1, . . . , b.sub.4 and thus the 3 remaining MSBs of the input digital word select among the eight current sources. In the example 8 bit DAC/MAC implementation where U=1, K=7, if the single most significant bit (MSB) of the 8-bit digital word is a value of ‘1’ then the second of the two current sources (in addition to the first current source cell) is also selected and both will receive the seven (7) digital input bits, i.e., bits b.sub.0, b.sub.1, . . . , b.sub.6.
(27) As shown in
(28) Thus, as shown in
(29) In an embodiment, the controller 198 is a binary-to-thermometer decoder such that for the N=8, U=2, K=6 example implementation, 2 MSBs binary words ensure that of the four possible outcomes ‘00’, ‘01’, ‘10’, ‘11’ only one current source switches at a time, e.g., MSBs ‘00’ decode to 0001 (to select first current source), ‘01’ MSBs of the 8 bit digital input signal decodes to 0011 and used to switch in the second current source; ‘10’ MSB values decode to 0111 and used to select the third current source; and MSBs ‘11’ of the 8 bit digital input signal decodes to 1111 and used to switch in the fourth current source so only one current source is switched at a time.
(30) Thus, each sub-array consisting of K switching elements can be also realized using fully equal size of current sources (e.g., thermometer weighted), i.e., the 256 elements can be implemented using U=4 (2.sup.4=16) and in each sub-array there are 16 equally weighted elements, with the constraint that starting point of branch N+1 switches equals to twice the size of the largest switch of prior branch N.
(31) Generally, the choice of segmentation, i.e., number 2.sup.(U) of current source cells 120 and connected splitters and the corresponding number of differential-pair switching transistors receiving bits at each splitter is reconfigurable by software/digital bits. The binary weighted paired differential switching transistors 126 of array 201 consume significantly smaller area compared to large current source arrays, thereby significantly lowering power consumption. For example, a conventional scheme in binary coding of
(32) In a further embodiment, each of the 2.sup.(U) current sources can be binary weighted, and each of the k=K−1 differential-pair switching transistors receiving bits at each splitter are binary weighted. In another variation, each of 2.sup.(U) current sources are binary weighted and each of the k=K−1 differential-pair switching transistors receiving bits at each splitter are thermometer weighted.
(33) A different radix system is also possible for a general purpose multiply and accumulate such that chip area of the current sources is reduced with very little penalty in power consumption.
(34) As further shown in
(35) In a further embodiment, a multiply and accumulate (MAC) circuit 150 is illustrated in
(36) In an alternate embodiment having a plurality of NMOS transistor current sinks (instead of PMOS current sources) receiving current from a respective configuration of transistor switches, a memory cell 101 is coupled to a terminal, e.g., a drain, of the respective NMOS transistor.
(37)
(38) In particular, MAC unit 300 of
(39) In the embodiment of
(40) In the embodiment of
(41) In an embodiment, each set of parallel connected differential-pair switching transistors 326 of a current splitters 325A, . . . , 325N includes MOS transistors (e.g., PMOS transistors) of a minimal size, e.g., on the order of 100 times smaller than a size of the unit current source transistor.
(42) In the embodiment depicted in
(43) As further shown in
(44) In additional embodiments, besides coupling the summed analog output currents from each current splitter of the first hierarchy current scaling network to a second current scaling network layer, further hierarchical current division layers can be configured to provide further current division of a desired bit division, e.g., summed analog output currents from each current splitter of the second hierarchy current scaling network can be respectively coupled to a further current splitter of a third hierarchy current scaling network in like manner as shown in
(45)
(46) In an example AU implementation, neuronal “weight” values such as learned by iterative flow of training in neural networks such as obtained during a network training phase can be represented using selected current sources 120, and physical variables or vectors to be multiplied can be represented by the digital words input as represented by the digital input bit representation variables b.sub.0, b.sub.1, . . . b.sub.k of a first vector for multiplication and current summing at the first hierarchical current scaling layer 201 and represented by the digital input bit representation variables α.sub.0, α.sub.1, . . . α.sub.g, of a second vector for multiplication and current summing at the first hierarchical current scaling layer 301.
(47) That is, in MAC sub-units 401, 402 the current sources 120 in each are configured as the “weights” values “w” and may be hard-coded or configured in the system to provide the digital representation of the weights. A first vector to be multiplied is obtained by the current source 120 “weights” multiplied by the digital bits b.sub.0, b.sub.1, . . . b.sub.k of the digital word inputs representing a first vector at the first hierarchical current division level 201 of MAC sub-units 401, 402 to provide output currents I.sub.out+ carried on respective first conductors 130A, 130N and and I.sub.out− carried on respective second conductors 140A, 140N as a result of the respective multiplication and current summation operations as output at first hierarchy level 201. A second vector to be multiplied is formed using the digital bits a.sub.0, a.sub.1, . . . a.sub.g of the digital word inputs representing a second vector at the second hierarchical current division level 301 of MAC sub-units 401, 402 to provide output currents I.sub.out+ carried on first conductor 330 and I.sub.out− carried on second conductor 340 as a result of the respective multiplication and current summation operations as output at second hierarchy level 301.
(48) In an example AI implementation, neuronal “weight” values such as learned by iterative flow of training in neural networks such as obtained during a network training phase can be represented using selected current sources 120, and physical variables or vectors to be multiplied can be represented by the digital words input as represented by the digital input bit representation variables b.sub.0, b.sub.1, . . . b.sub.k of a first vector for multiplication and current summing at the first hierarchical current scaling network layer 201 and represented by the digital input bit representation variables α.sub.0, α.sub.1, . . . α.sub.g, of a second vector for multiplication and current summing at the first hierarchical current scaling network layer 301.
(49) In a further embodiment,
(50) In an example implementation, the MAC sub-unit 401 can provide an “I phase” processing to provide a “sin θ” vector representation and MAC sub-unit 402 can provide a “Q phase” processing to provide a cosin θ vector representation to perform trigonometric operations etc. related to artificial intelligence neural network applications (e.g., for dot products, gradients, convolutions, transformations, etc).
(51) In this embodiment, the digital words of a vector input to the MAC sub-unit 402 as digital bits b.sub.0, b.sub.1, . . . b.sub.k are the same or different as the digital bit values b.sub.0, b.sub.1, . . . b.sub.3 of the vector word inputs (e.g., j=k or j≠k) to the MAC sub-unit 401 however are delayed by a quarter phase cycle of the clock used to sequence the data input at MAC sub-unit 401. As such, a current representation of quadrature vector input is obtained at by sampling a data sequence using a data sampling clock at a time shifted (delayed) by a quarter phase (e.g., 90°) with respect to the digital words input to the respective MAC sub-unit 401 from the data sampled at a main or fundamental clock frequency. Thus, the MAC circuit 400 in the embodiment of
(52) In a further embodiment, as shown in
(53) For example, for the “α sin θ” vector representation, the “α” phase vector coefficient is obtained as:
α=N1/M1,
(54) and for the “β cosin θ” vector representation the “β” phase vector coefficient is obtained as:
β=N2/M2.
(55) Thus, the phase shifter is obtained as
(56)
(57) For example, given a total number of current cells, e.g., 26 cells, out of a total 64 current cells 120 for I phase vector current processing, and given a total number of current cells, e.g., 39 cells, out of a total 64 current cells 120 for Q phase vector current processing, a ratio of the two current is arctan (39/26) will represent the phase shift representing the degree of mixing or summing of the I and Q phase components.
(58) In embodiments, a phase shift relation between the I phase and Q phase vector current representation is further related to:
(N1/M1).sup.2+(N2/M2).sup.2=1.
(59) The present specification further describes a method for performing a multiply-accumulate operations wherein the MAC units 100, 300, 400 are used to perform a digital to analog conversion function or a multiplication and accumulate function and receive a number of input signals represented as sampled digital word bits b.sub.0, b.sub.1, . . . b.sub.k and the digital input bits α.sub.0, α.sub.1, . . . α.sub.g whether sampled at a fundamental clocking rate or frequency or sampled at an clock phase offset from the fundamental clocking rate as applied at the multiplication engine. An MRAM memory element can store the result of such multiplication operations.
(60) Further, in applications, a random access memory system or such volatile system memory such as dynamic random access memory (DRAM) or non-volatile magnetoresistive RAM (MRAM) system provide the memory elements receiving data bits from the multiplication engine current steering units 100, 300, 400. In the case of a MRAM memory element, an MTJ MRAM memory cell has a low latency and is thus excellent for storage, for AI/MAC computing operations as such a memory cell enables a fast readout speed, and is ideal for fast computation. Further, an MRAM cell area is very small.
(61) Arrays of such MTJ memory elements coupled with the multiplication engine current steering units 100, 300, 400 may be used in a variety of AI applications, e.g., pattern recognition, and other applications. The multiplication engine current steering units 300, 400 can interface with a MRAM memory array and increase the efficiency of digital signal processing used to perform a multiply-accumulate (MAC) operations of/for matrix values and input vector values, e.g., perform multiply-accumulate computations for vector-matrix operations, dot product operations, filtering or Fast Fourier Transform operations of the input signals.
(62)
(63) Through each respective memory cell 601, 602 flows a respective differential output current I.sub.out+ carried on a directly coupled first conductor 603 from memory cell 601 and I.sub.out− current carried on second conductor 604 directly coupled to memory cell 602. Each differential output current I.sub.out+ carried on first conductor 603 and I.sub.out− current carried on second conductor 604 is received at a first hierarchy current scaling network 611 connecting a plurality of current splitter elements 625A, . . . , 625N, that each provides a current division operation.
(64) More particularly, in the embodiment shown in
(65) In the embodiment depicted in
(66) In the embodiment shown in
(67) For example, at each current splitter 625A, . . . , 625N, each first parallel connected differential-pair transistors 626 is a first binary weighted differential-pair (NMOS) switching transistors 626 having a unit “1” scale (i.e., weight=1) for receiving at respective gate terminal of each a first differential pair of input bits a.sub.0, ā.sub.0 corresponding to the least significant bits (LSB) of the input digital signal, a second binary weighted differential-pair switching transistors (not shown) having a weight=2, for receiving at respective gate terminals a second differential pair of input bits a.sub.1, ā.sub.1, a third binary weighted differential-pair switching transistors (not shown) having a weight=4, for receiving a third differential pair of input bits a.sub.2, ā.sub.2, etc. up to and including a final binary weighted differential-pair switching transistors 626 having a corresponding weighting 2.sup.(G−1) for receiving a final differential pair of input bits a.sub.g, ā.sub.g corresponding to a most significant bit (MSB) position of the input digital signal. The output differential currents of each respective current splitter 625A, . . . , 625N is a differential current scaled (divided) down from the I.sub.out+ carried on first conductor 603 and I.sub.out− current carried on second conductor 604.
(68) For example, in
(69) In an embodiment, the respective conductors 630A, 640A, . . . , 630N, 640N carrying differential currents divided down by each respective splitter 625A, . . . , 625N according to an input digital word(s) are connected to a respective current splitter 725A, . . . , 725N of a second hierarchy current scaling network 621 that provides a further level of current division according to a further input digital word(s).
(70) That is, each corresponding further current splitter 725A, . . . , 725N of the second hierarchy current scaling network 621 connects with a corresponding current splitter 625A, . . . , 625N of the first hierarchy current scaling network 611. Thus, as shown in
(71) In the embodiment depicted in
(72) In the embodiment shown in
(73) For example, at each current splitter 725A, . . . , 725N, each first parallel connected differential-pair transistors 626 is a first binary weighted differential-pair (NMOS) switching transistors 626 having a unit “1” scal (i.e., weight=1) for receiving at respective gate terminal of each a first differential pair of input bits b.sub.0,
(74) Generally, as shown at splitter 725N a parallel connected set of differential-pair NMOS switching transistors includes a first binary weighted differential-pair switching transistors 726 having a weighting scale 2.sup.(X−1) corresponding to received first differential bits b.sub.0,
(75) As further shown in
(76) In particular, the current-steering circuit architecture 600 shown in
(77) In one embodiment, each identical current sink element 620 of the array 640 is equally unary-weighted, i.e., sinking a unit current of a scale of “1”. Each current sink element 620 includes a cascode connection of NMOS transistors, including a respective current sink transistor 615 for current source matching and a cascode connected transistor 616 for providing high output impedance, where the current source transistor 615 includes source terminal connected to a ground reference 645. A biasing voltage V.sub.B is provided for each current source transistor 615 and a biasing voltage V.sub.BC is for biasing each cascode-connected transistor 616.
(78) Thus, in operation, MTJ elements 601, 602 will store a result of a current flow path from VDD to ground modified by respective analog current division and accumulation operations at first hierarchy current scaling network 611 as scaled (divided) in accordance with input digital words, e.g., the digital words input as represented by the digital input bits α.sub.0, α.sub.1, . . . α.sub.g, and operations at second hierarchy current scaling network 621 as scaled in accordance with input digital words, e.g., the digital words input as represented by the digital input bits of an input digital word and bits b.sub.0, b.sub.1, . . . b.sub.k.
(79) In embodiments, the first scaling network 611 is optional and the memory elements 601, 602 can directly connect to the current splitters of the second hierarchy current scaling network 621. Further, in embodiments, additional current scaling network layers having current splitters may be added to provide further levels of current division.
(80) Moreover, the DAC//MAC unit 600 of
(81) As in the above-described embodiments, the size of the switching transistors 625 is much smaller than the current sink elements 620, leading to overall small area for the DAC/MAC implementation. Matching of the current sources becomes worse at scaled CMOS technologies, and using smaller number of current sources/sinks leads to superior performance of the DAC/MAC functions. Hence, the disclosed techniques provide superior performance as CMOS technology scales.
(82) While the present application has been particularly shown and described with respect to preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in forms and details may be made without departing from the spirit and scope of the present application. It is therefore intended that the present application not be limited to the exact forms and details described and illustrated, but fall within the scope of the appended claims.