Asynchronous analog accelerator for fully connected artificial neural networks
11610104 · 2023-03-21
Inventors
Cpc classification
G06F13/4022
PHYSICS
H03M1/122
ELECTRICITY
H03M1/765
ELECTRICITY
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G11C7/16
PHYSICS
G11C7/1006
PHYSICS
G11C7/1012
PHYSICS
International classification
G06N3/06
PHYSICS
Abstract
Methods of performing mixed-signal/analog multiply-accumulate (MAC) operations used for matrix multiplication in fully connected artificial neural networks in integrated circuits (IC) are described in this disclosure having traits such as: (1) inherently fast and efficient for approximate computing due to current-mode signal processing where summation is performed by simply coupling wires, (2) free from noisy and power hungry clocks with asynchronous fully-connected operations, (3) saving on silicon area and power consumption for requiring neither any data-converters nor any memory for intermediate activation signals, (4) reduced dynamic power consumption due to Compute-In-Memory operations, (5) avoiding over-flow conditions along key signals paths and lowering power consumption by training MACs in neural networks in such a manner that the population and or combinations of multi-quadrant activation signals and multi-quadrant weight signals follow a programmable statistical distribution profile, (6) programmable current consumption versus degree of precision/approximate computing, (7) suitable for ‘always-on’ operations and capable of ‘self power-off’, (8) inherently simple arrangement for non-linear activation operations such as Rectified Linear Unit, ReLu, and (9) manufacturable on main-stream, low cost, and lagging edge standard digital CMOS process requiring neither any resistors nor any capacitors.
Claims
1. A scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method comprising: operating an at least one of a plurality of scalar multipliers (X.sub.P,Q), each having a weight-input port (W), an scalar activation-input port (S), a reference-input port (R), and a multiplication product-output port (Z), wherein a difference signal V.sub.S−V.sub.R is the difference between an S port voltage signal (V.sub.S) and an R port voltage signal (V.sub.R) of each scalar multiplier, and the V.sub.S−V.sub.R difference signal is substantially equal to a voltage signal difference between the Z port and the W port (V.sub.Z-V.sub.W) of each scalar multiplier, and wherein a multiplication product (I.sub.W.Math.I.sub.S) of current signals at the W port (I.sub.W) and the S port (I.sub.S) of each scalar multiplier is substantially equal to a multiplication product (I.sub.R.Math.I.sub.Z) of current signals at the R port (I.sub.R) and the Z port (I.sub.Z) of each scalar multiplier, and wherein in each scalar multiplier, (V.sub.S−V.sub.R) is substantially equal to (V.sub.Z−V.sub.W) and (I.sub.W.Math.I.sub.S) is substantially equal to (I.sub.R.Math.I.sub.Z): performing scalar multiplication by a sharing a circuit that generates at least one of (1) the difference signal V.sub.S−V.sub.R, and (2) the V.sub.S and the V.sub.R, with a selected of the at least one of the plurality of scalar multipliers; arranging a 1-1 multiplier (X.sub.1,1) in a first layer of neural network (NN) comprising: supplying the W port of the X.sub.1,1 multiplier with a W.sub.1,1 input-signal; supplying the S port of the X.sub.1,1 multiplier with a S.sub.1 input-signal; supplying the R port of the X.sub.1,1 multiplier with a R.sub.1,1 input-signal; generating a W.sub.1,1 S.sub.1/R.sub.1,1 output-signal (Z.sub.1,1) at the Z output port of the X.sub.1,1 multiplier; arranging a 1-2 multiplier (X.sub.1,2) in the first layer of the NN comprising: supplying the W port of the X.sub.1,2 multiplier with a W.sub.1,2 input-signal; supplying the S port of the X.sub.1,2 multiplier with a S.sub.1 input-signal; supplying the R port of the X.sub.1,2 multiplier with a R.sub.1,2 input-signal; generating a W.sub.1,2 S.sub.1/R.sub.1,2 output-signal (Z.sub.1,2) at the Z output port of the X.sub.1,2 multiplier; arranging a 2-1 multiplier (X.sub.2,1) in the first layer of the NN comprising: supplying the W port of the X.sub.2,1 multiplier with a W.sub.2,1 input-signal; supplying the S port of the X.sub.2,1 multiplier with a S.sub.2 input-signal; supplying the R port of the X.sub.2,1 multiplier with a R.sub.2,1 input-signal; generating a W.sub.2,1 S.sub.2/R.sub.2,1 output-signal (Z.sub.1,1) at the Z output port of the X.sub.2,1 multiplier; arranging a 2-2 multiplier (X.sub.2,2) in the first layer of the NN comprising: supplying the W port of the X.sub.2,2 multiplier with a W.sub.2,2 input-signal; supplying the S port of the X.sub.2,2 multiplier with a S.sub.2 input-signal; supplying the R port of the X.sub.2,2 multiplier with a R.sub.2,2 input-signal; generating a W.sub.2,2 S.sub.2/R.sub.2,2 output-signal (Z.sub.2,2) at the Z output port of the X.sub.2,2 multiplier; summing the Z.sub.1,1 and the Z.sub.2,1 signals to generate a Z.sub.1 signal (Z.sub.1,1+Z.sub.2,1) in the first layer of the NN; summing the Z.sub.2,1 and the Z.sub.2,2 signals to generate a Z.sub.2 signal (Z.sub.2,1+Z.sub.2,2) in the first layer of the NN; subtracting a 1 Bias-Offset signal (B.sub.1) from the Z.sub.1 signal to generate a S′.sub.1 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.1−B.sub.1; and subtracting a 2 Bias-Offset signal (B.sub.2) from the Z.sub.2 signal to generate a S′.sub.2 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.2−B.sub.2.
2. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 1, the method further comprising: arranging a 1-1′ multiplier (X′.sub.1,1) in a second layer of NN comprising: supplying the W port of the X′.sub.1,1 multiplier with a W′.sub.1,1 input-signal; supplying the S port of the X′.sub.1,1 multiplier with a S′.sub.1 input-signal; supplying the R port of the X′.sub.1,1 multiplier with a R′.sub.1,1 input-signal; generating a W′.sub.1,1 S′.sub.1/R′.sub.1,1 output-signal (Z′.sub.1,1) at the Z output port of the X′.sub.1,1 multiplier; arranging a 1-2′ multiplier (X′.sub.1,2) in the second layer of the NN comprising: supplying the W port of the X′.sub.1,2 multiplier with a W′.sub.1,2 input-signal; supplying the S port of the X′.sub.1,2 multiplier with a S′.sub.1 input-signal; supplying the R port of the X′.sub.1,2 multiplier with a R′.sub.1,2 input-signal; generating a W′.sub.1,2 S′.sub.1/R′.sub.1,2 output-signal (Z′.sub.1,2) at the Z output port of the X′.sub.1,2 multiplier; arranging a 2-1′ multiplier (X′.sub.2,1) in the second layer of the NN comprising: supplying the W port of the X′.sub.2,1 multiplier with a W′.sub.2,1 input-signal; supplying the S port of the X′.sub.2,1 multiplier with a S′.sub.2 input-signal; supplying the R port of the X′.sub.2,1 multiplier with a R′.sub.2,1 input-signal; generating a W′.sub.2,1×S′.sub.2/R′.sub.2,1 output-signal (Z′.sub.1,1) at the Z output port of the X′.sub.2,1 multiplier; arranging a 2-2′ multiplier (X′.sub.2,2) in the second layer of the NN comprising: supplying the W port of the X′.sub.2,2 multiplier with a W′.sub.2,2 input-signal; supplying the S port of the X′.sub.2,2 multiplier with a S′.sub.2 input-signal; supplying the R port of the X′.sub.2,2 multiplier with a R′.sub.2,2 input-signal; generating a W′.sub.2,2 S′.sub.2/R′.sub.2,2 output-signal (Z′.sub.2,2) at the Z output port of the X′.sub.2,2 multiplier; summing the Z′.sub.1,1 and the Z′.sub.2,1 signals to generate a Z′.sub.1 signal (Z′.sub.1,1+Z′.sub.2,1) in the second layer of the NN; summing the Z′.sub.2,1 and the Z′.sub.2,2 signals to generate a Z′.sub.2 signal (Z′.sub.2,1+Z′.sub.2,2) in the second layer of the NN; subtracting a 1′ Bias-Offset signal (B′.sub.1) from the Z′.sub.1 signal to generate a S″.sub.1 intermediate activation signal, in the second layer of the NN, that is substantially equal to Z′.sub.1−B′.sub.1; and subtracting a 2′ Bias-Offset signal (B′.sub.2) from the Z′.sub.2 signal to generate a S″.sub.2 intermediate activation signal, in the second layer of the NN, that is substantially equal to Z′.sub.2−B′.sub.2.
3. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 2, the method further comprising: supplying the R port of each multiplier in the second NN layer with at least one of a common reference input signal (R′), and individual reference input signals.
4. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 2, the method further comprising: arranging each scalar multiplier in the second NN layer with at least one of (1) scalar substrate vertical BJT-based multipliers, (2) scalar subthreshold MOSFET-based multipliers, and (3) current-mode analog multipliers.
5. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 2, the method further comprising: performing a batch normalization function in the second NN layer on the S″.sub.1 intermediate activation signal to generate an S″.sub.1N normalized intermediate activation signal; and performing a batch normalization function in the second NN layer on the S″.sub.2 intermediate activation signal to generate an S″.sub.1N normalized intermediate activation signal.
6. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 2, the method further comprising: performing a ReLu function in the second NN layer on the S″.sub.1 intermediate activation signal to generate an S″.sub.1R intermediate activation signal substantially equal to ReLu{Z′.sub.1−B′.sub.1}; and performing a ReLu function in the second NN layer on the S″.sub.2 intermediate activation signal to generate an S″.sub.2R intermediate activation signal substantially equal to ReLu{Z′.sub.2−B′.sub.2}.
7. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 2, the method further comprising: operating a plurality of Digital-to-Analog-Converters (DAC), each having a current-output signal (O) proportional to its reference input signal (r) and responsive to its digital input code (D); supplying at least one W code (D.sub.W) to at least one of the plurality of DACs (DAC.sub.W), generating an at least one O signal (W.sub.O); supplying at least one W port of at least one scalar multiplier in the second NN layer with the at least one W.sub.O signal; supplying at least one R code (D.sub.R) to at least one other of the plurality of DACs (DACs), generating an at least one R signal (R.sub.O); supplying at least one R port of at least one other scalar multiplier in the second NN layer with the at least one R.sub.O signal; and receiving at least one of the D.sub.W and D.sub.R codes from at least one of a Latch array, a Static-Random-Access-Memory (SRAM) array, an Erasable-Programmable-Read-Only-Memory (EPROM) array, and an Electrically-Erasable-Programmable-Read-Only-Memory (EPROM) array.
8. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 1, the method further comprising: supplying the R port of each multiplier in the first NN layer with at least one of a common reference input signal (R), and individual reference input signals.
9. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 1, the method further comprising: arranging each scalar multiplier in the first NN layer with at least one of (1) scalar substrate vertical BJT-based multipliers, (2) scalar subthreshold MOSFET-based multipliers, and (3) current-mode analog multipliers.
10. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 1, the method further comprising: performing a batch normalization function in the first NN layer on the S′.sub.1 intermediate activation signal to generate an S′.sub.1N normalized intermediate activation signal; and performing a batch normalization function in the first NN layer on the S′.sub.2 intermediate activation signal to generate an S′.sub.1N normalized intermediate activation signal.
11. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 1, the method further comprising: performing a ReLu function in the first NN layer on the S′.sub.1 intermediate activation signal to generate an S′.sub.1R intermediate activation signal substantially equal to ReLu{Z.sub.1−B.sub.1}; and performing a ReLu function in the first NN layer on the S′.sub.2 intermediate activation signal to generate an S′.sub.2R intermediate activation signal substantially equal to ReLu{Z.sub.2−B.sub.2}.
12. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 11, the method further comprising: supplying the S port of each of an X′.sub.1,1 multiplier and an X′.sub.1,2 multiplier in a second NN layer with the intermediate activation signal S′.sub.1R; and supplying the S port of each of an X′.sub.2,1 multiplier and an X′.sub.2,2 multiplier in a second NN layer with the intermediate activation signal S′.sub.2R.
13. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 1, the method further comprising: operating a plurality of Digital-to-Analog-Converters (DAC), each having a current-output signal (O) proportional to its reference input signal (r) and responsive to its digital input code (D); supplying at least one W code (D.sub.W) to at least one of the plurality of DACs (DAC.sub.W), generating an at least one O signal (W.sub.O); supplying at least one W port of at least one scalar multiplier in the first NN layer with the at least one W.sub.O signal; supplying at least one R code (D.sub.R) to at least one other of the plurality of DACs (DACs), generating an at least one R signal (R.sub.O); supplying at least one R port of at least one other scalar multiplier in the first NN layer with the at least one R.sub.O signal; and receiving at least one of the D.sub.W and D.sub.R codes from at least one of a Latch array, a Static-Random-Access-Memory (SRAM) array, an Erasable-Programmable-Read-Only-Memory (EPROM) array, and an Electrically-Erasable-Programmable-Read-Only-Memory (EPROM) array.
14. A scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method comprising: operating a plurality of Digital-to-Analog-Converters (DAC), each having a current-output signal (O) proportional to its reference input signal (r) and responsive to its digital input code (D); operating a plurality of Reference Bias Networks (RBN), the output of each generating a plurality of reference voltage bus signals (VRi) responsive to binary weighted currents, each binary weighted current proportional to a respective RBN input current signal (I.sub.RBN); operating a plurality of Tiny DACs (tDAC), the current-output signal (OT) of each being proportional to a plurality of voltage reference signals (REF′) and responsive to its digital input code (DT); supplying a S.sub.1 activation input code (D.sub.S1) to a DAC.sub.S1 to generate a O.sub.S1 output signal in a first neural network (NN) layer, wherein the O.sub.S1 signal is substantially the same as the I.sub.RBN of a R.sub.BNS1, and generating a plurality of VRi.sub.S1 reference voltage signals; supplying a W.sub.1,1 digital input code to a tDAC.sub.1,1 in the first NN layer, wherein the plurality of VRi.sub.S1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC.sub.1,1, and generating a Z.sub.1,1=S.sub.1×W.sub.1,1×1/r scalar multiplication product output signal; supplying a W.sub.1,2 digital input code to a tDAC.sub.1,2 in the first NN layer, wherein the plurality of VRi.sub.S1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC.sub.1,2, and generating a Z.sub.1,2=S.sub.1×W.sub.1,2×1/r scalar multiplication product output signal; supplying a S.sub.2 activation input code (D.sub.S2) to a DAC.sub.S2 to generate a O.sub.S2 output signal in the first NN layer, wherein the O.sub.S2 signal is substantially the same as the I.sub.RBN of a RBN.sub.S2, and generating a plurality of VRi.sub.S2 reference voltage signals; supplying a W.sub.2,1 digital input code to a tDAC.sub.2,1 in the first NN layer, wherein the plurality of VRi.sub.S2 reference voltage signals is substantially the same as the REF, of the tDAC.sub.2,1, and generating a Z.sub.2,1=S.sub.2×W.sub.2,1×1/r scalar multiplication product output signal; supplying a W.sub.2,2 digital input code to a tDAC.sub.2,2 in the first NN layer, wherein the plurality of VRi.sub.S2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC.sub.2,2, and generating a Z.sub.2,2=S.sub.2×W.sub.2,2×1/r scalar multiplication product output signal; summing the Z.sub.1,1 and the Z.sub.2,1 signals to generate a Z.sub.1 signal (Z.sub.1,1+Z.sub.2,1) in the first layer of the NN; summing the Z.sub.2,1 and the Z.sub.2,2 signals to generate a Z.sub.2 signal (Z.sub.2,1+Z.sub.2,2) in the first layer of the NN; subtracting a 1 Bias-Offset signal (B.sub.1) from the Z.sub.1 signal to generate a S′.sub.1 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.1−B.sub.1; and subtracting a 2 Bias-Offset signal (B.sub.2) from the Z.sub.2 signal to generate a S′.sub.2 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.2−B.sub.2.
15. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 14, the method further comprising: supplying a S′, activation input code (D.sub.S′1) to a DAC.sub.S′1 to generate an O.sub.S′1 output signal in a second neural network (NN) layer, wherein the O.sub.S′1 signal is substantially the same as the I.sub.RBN of a RBN.sub.S′1, and generating a plurality of VRi.sub.S′1 reference voltage signals; supplying a W′.sub.1,1 digital input code to a tDAC′.sub.1,1 in the second NN layer, wherein the plurality of VRi.sub.S′1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.1,2, and generating a Z.sub.1,1=S.sub.1×W.sub.1,1×1/r scalar multiplication product output signal; supplying a W′1,2 digital input code to a tDAC′.sub.1,2 in the second NN layer, wherein the plurality of VRi.sub.S′1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.1,2, and generating a Z′.sub.1,2=S′.sub.1×W′.sub.1,2×1/r scalar multiplication product output signal; supplying a S′.sub.2 activation input code (D.sub.S′2) to a DAC.sub.S′2 to generate an O.sub.S′2 output signal in the second NN layer, wherein the O.sub.S′2 signal is substantially the same as the I.sub.RBN of a RBN.sub.S′2, and generating a plurality of VRi.sub.S′2 reference voltage signals; supplying a W′2,1 digital input code to a tDAC′.sub.2,1 in the second NN layer, wherein the plurality of VRi.sub.S′2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.2,1, and generating a Z′.sub.2,1=S′.sub.2×W′.sub.2,2×1/r scalar multiplication product output signal; supplying a W′2,2 digital input code to a tDAC′.sub.2,2 in the second NN layer, wherein the plurality of VRi.sub.S′2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.2,2, and generating a Z′.sub.2,2=S′.sub.2×W′.sub.2,2×1/r scalar multiplication product output signal; summing the Z′.sub.1,1 and the Z′.sub.2,1 signals to generate a Z′.sub.1 signal (Z′.sub.1,1+Z′.sub.2,1) in the first layer of the NN; summing the Z′.sub.2,1 and the Z′.sub.2,2 signals to generate a Z′.sub.2 signal (Z′.sub.2,1+Z′.sub.2,2) in the first layer of the NN; subtracting a 1′ Bias-Offset signal (B′.sub.1) from the Z′, signal to generate a S″, intermediate activation signal, in the first layer of the NN, that is substantially equal to Z′.sub.1−B′.sub.1; and subtracting a 2′ Bias-Offset signal (B′.sub.2) from the Z′.sub.2 signal to generate a S″.sub.2 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z′.sub.2−B′.sub.2.
16. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 14, the method further comprising: performing a ReLu function on the S′.sub.1 signal to generate an S′.sub.1R intermediate activation signal substantially equal to ReLu{Z.sub.1−B.sub.1}; and performing a ReLu function on the S′.sub.2 signal to generate an S′.sub.2R intermediate activation signal substantially equal to ReLu{Z.sub.2−B.sub.2}.
17. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 16, the method further comprising: wherein the intermediate activation signal S′.sub.1R is substantially the same as the I.sub.RBN of a RBN.sub.S′1; and wherein the intermediate activation signal S′.sub.2R is substantially the same as the I.sub.RBN of a RBN.sub.S′2.
18. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 14, the method further comprising: performing a batch normalization function on the S′.sub.1 signal to generate an S′.sub.1N normalized intermediate activation signal; and performing a batch normalization function on the S′.sub.2 signal to generate an S′.sub.2N normalized intermediate activation signal.
19. The scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit of claim 18, the method further comprising: wherein the normalized intermediate activation signal S′.sub.1N is substantially the same as the I.sub.RBN of a RBN.sub.S′1; and wherein the normalized intermediate activation signal S′ 2N is substantially the same as the I.sub.RBN of a RBN.sub.S′2.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The subject matter presented herein is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and illustrations, and in which like reference numerals refer to similar elements, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
SUMMARY OF THE DISCLOSURE
(10) An aspect of the present disclosure is a scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method comprising: Operating an at least one of a plurality of scalar multipliers (X.sub.P,Q), each having a weight-input port (W), an scalar activation-input port (S), a reference-input port (R), and a multiplication product-output port (Z), wherein a difference signal V.sub.S−V.sub.R is the difference between a S port voltage signal (V.sub.S) and a R port voltage signal (V.sub.R) of each scalar multiplier and the V.sub.S−V.sub.R difference signal is substantially equal to a voltage signal difference between Z and W ports (V.sub.Z−V.sub.W) of each scalar multipliers, and wherein a multiplication product (I.sub.W.Math.I.sub.S) of current signals at the W port (I.sub.W) and S port (I.sub.S) of each scalar multiplier is substantially equal to a multiplication product (I.sub.R.Math.I.sub.Z) of current signals at the R port (I.sub.R) and Z port (I.sub.Z) of each scalar multiplier, and wherein in each scalar multiplier, (V.sub.S−V.sub.R) is substantially equal to (V.sub.Z−V.sub.W) and (I.sub.W.Math.I.sub.S) is substantially equal to (I.sub.R.Math.I.sub.Z): Performing scalar multiplication by a sharing a circuit that generates at least one of (1) the V.sub.S−V.sub.R difference voltage signal, and (2) the V.sub.S and the V.sub.R, with a selected of the at least one of the plurality of scalar multipliers; Arranging a 1-1 multiplier (X.sub.1,1) in a first layer of neural network (NN) comprising: Supplying the W port of X.sub.1,1 multiplier with a W.sub.1,1 input-signal; Supplying the S port of X.sub.1,1 multiplier with a S.sub.1 input-signal; Supplying the R port of X.sub.1,1 multiplier with a R.sub.1,1 input-signal; Generating a W.sub.1,1×S.sub.1/R.sub.1,1 output-signal (Z.sub.1,1) at the Z output port of the X.sub.1,1 multiplier; Arranging a 1-2 multiplier (X.sub.1,2) in the first layer of the NN comprising: Supplying the W port of X.sub.1,2 multiplier with a W.sub.1,2 input-signal; Supplying the S port of X.sub.1,2 multiplier with a S.sub.1 input-signal; Supplying the R port of X.sub.1,2 multiplier with a R.sub.1,2 input-signal; Generating a W.sub.1,2×S.sub.1/R.sub.1,2 output-signal (Z.sub.1,2) at the Z output port of the X.sub.1,2 multiplier; Arranging a 2-1 multiplier (X.sub.2,1) in the first layer of the NN comprising: Supplying the W port of X.sub.2,1 multiplier with a W.sub.2,1 input-signal; Supplying the S port of X.sub.2,1 multiplier with a S.sub.2 input-signal; Supplying the R port of X.sub.2,1 multiplier with a R.sub.2,1 input-signal; Generating a W.sub.2,1×S.sub.2/R.sub.2,1 output-signal (Z.sub.1,1) at the Z output port of the X.sub.2,1 multiplier; Arranging a 2-2 multiplier (X.sub.2,2) in the first layer of the NN comprising: Supplying the W port of X.sub.2,2 multiplier with a W.sub.2,2 input-signal; Supplying the S port of X.sub.2,2 multiplier with a S.sub.2 input-signal; Supplying the R port of X.sub.2,2 multiplier with a R.sub.2,2 input-signal; Generating a W.sub.2,2 S.sub.2/R.sub.2,2 output-signal (Z.sub.2,2) at the Z output port of the X.sub.2,2 multiplier; Summing the Z.sub.1,1 and the Z.sub.2,1 signals to generate a Z.sub.1 signal (Z.sub.1,1+Z.sub.2,1) in the first layer of the NN; Summing the Z.sub.2,1 and the Z.sub.2,2 signals to generate a Z.sub.2 signal (Z.sub.2,1+Z.sub.2,2) in the first layer of the NN; Subtracting a 1 Bias-Offset signal (B.sub.1) from the Z.sub.1 signal to generate a S′.sub.1 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.1−B.sub.1; and Subtracting a 2 Bias-Offset signal (B.sub.2) from the Z.sub.2 signal to generate a S′.sub.2 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.2−B.sub.2. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Arranging a 1-1′ multiplier (X′.sub.1,1) in a second layer of NN comprising: Supplying the W port of X′.sub.1,1 multiplier with a W′.sub.1,1 input-signal; Supplying the S port of X′.sub.1,1 multiplier with a S′.sub.1 input-signal; Supplying the R port of X′.sub.1,1 multiplier with a R′.sub.1,1 input-signal; Generating a W′.sub.1,1×S′ output-signal (Z′.sub.1,1) at the Z output port of the X′.sub.1,1 multiplier; Arranging a 1-2′ multiplier (X′.sub.1,2) in the second layer of the NN comprising: Supplying the W port of X′.sub.1,2 multiplier with a W′.sub.1,2 input-signal; Supplying the S port of X′.sub.1,2 multiplier with a S′.sub.1 input-signal; Supplying the R port of X′.sub.1,2 multiplier with a R′.sub.1,2 input-signal; Generating a W′.sub.1,2×S′.sub.1/R′.sub.1,2 output-signal (Z′.sub.1,2) at the Z output port of the X′.sub.1,2 multiplier; Arranging a 2-1′ multiplier (X′.sub.2,1) in the second layer of the NN comprising: Supplying the W port of X′.sub.2,1 multiplier with a W′.sub.2,1 input-signal; Supplying the S port of X′.sub.2,1 multiplier with a S′.sub.2 input-signal; Supplying the R port of X′.sub.2,1 multiplier with a R′.sub.2,1 input-signal; Generating a W′.sub.2,1×S′.sub.2/R′.sub.2,1 output-signal (Z′.sub.1,1) at the Z output port of the X′.sub.2,1 multiplier; Arranging a 2-2′ multiplier (X′.sub.2,2) in the second layer of the NN comprising: Supplying the W port of X′.sub.2,2 multiplier with a W′.sub.2,2 input-signal; Supplying the S port of X′.sub.2,2 multiplier with a S′.sub.2 input-signal; Supplying the R port of X′.sub.2,2 multiplier with a R′.sub.2,2 input-signal; Generating a W′.sub.2,2×S′.sub.2/R′.sub.2,2 output-signal (Z′.sub.2,2) at the Z output port of the X′.sub.2,2 multiplier; Summing the Z′.sub.1,1 and the Z′.sub.2,1 signals to generate a Z′.sub.1 signal (Z′.sub.1,1+Z′.sub.2,1) in the second layer of the NN; Summing the Z′.sub.2,1 and the Z′.sub.2,2 signals to generate a Z′.sub.2 signal (Z′.sub.2,1+Z′.sub.2,2) in the second layer of the NN; Subtracting a 1′ Bias-Offset signal (B′.sub.1) from the Z′.sub.1 signal to generate a S″.sub.1 intermediate activation signal, in the second layer of the NN, that is substantially equal to Z′.sub.1−B′.sub.1; and Subtracting a 2′ Bias-Offset signal (B′.sub.2) from the Z′.sub.2 signal to generate a S″.sub.2 intermediate activation signal, in the second layer of the NN, that is substantially equal to Z′.sub.2−B′.sub.2. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Wherein the R port of each scalar multiplier in the first NN layer can be supplied with at least of a common reference input signal (R), and individual reference input signals. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Wherein the R port of each scalar multiplier in the second NN layer can be supplied with at least of a common reference input signal (R′), and individual reference input signals. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Wherein each scalar multiplier in the first NN layer is arranged with at least one of (1) scalar substrate vertical BJT-based multipliers, (2) scalar subthreshold MOSFET-based multipliers, and (3) current-mode analog multipliers. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Wherein each scalar multiplier in the second NN layer is arranged with at least one of (1) scalar substrate vertical BJT-based multipliers, (2) scalar subthreshold MOSFET-based multipliers, and (3) current-mode analog multipliers. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: performing a batch normalization function in the first NN layer on the S′.sub.1 intermediate activation signal to generate an S′1N normalized intermediate activation signal; and performing a batch normalization function in the first NN layer on the S′.sub.2 intermediate activation signal to generate an S′.sub.1N normalized intermediate activation signal. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: performing a batch normalization function in the second NN layer on the S″.sub.1 intermediate activation signal to generate an S″.sub.1N normalized intermediate activation signal; and performing a batch normalization function in the second NN layer on the S″.sub.2 intermediate activation signal to generate an S″.sub.1N normalized intermediate activation signal. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: performing a ReLu function in the first NN layer on the S′.sub.1 intermediate activation signal to generate an S′.sub.1R intermediate activation signal substantially equal to ReLu{Z.sub.1−B.sub.1}; and performing a ReLu function in the first NN layer on the S′.sub.2 intermediate activation signal to generate an S′.sub.2R intermediate activation signal substantially equal to ReLu{Z.sub.2−B.sub.2}. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: performing a ReLu function in the second NN layer on the S″.sub.1 intermediate activation signal to generate an S″.sub.1R intermediate activation signal substantially equal to ReLu{Z′.sub.1−B′.sub.1}; and performing a ReLu function in the second NN layer on the S″.sub.2 intermediate activation signal to generate an S″.sub.2R intermediate activation signal substantially equal to ReLu{Z′.sub.2−B′.sub.2}. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Supplying the S port of each of the X′.sub.1,1 and X′.sub.1,2 multipliers in the second NN layer with the intermediate activation signal S′.sub.1R; and Supplying the S port of each of the X′.sub.2,1 and X′.sub.2,2 multipliers in the second NN layer with the intermediate activation signal S′.sub.2R. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Operating a plurality of Digital-to-Analog-Converters (DAC), each having a current-output signal (O) proportional to its reference input signal (r) and responsive to its digital input code (D); Supplying at least one W code (D.sub.W) to at least one of the plurality of DACs (DAC.sub.W), generating an at least one O signal (W.sub.O); Supplying at least one W port of at least one scalar multiplier in the first NN layer with the at least one W.sub.O signal; Supplying at least one R code (D.sub.R) to at least one other of the plurality of DACs (DACs), generating an at least one R signal (R.sub.O); Supplying at least one R port of at least one other scalar multiplier in the first NN layer with the at least one R.sub.O signal; and Receiving at least one of the D.sub.W and D.sub.R codes from at least one of a Latch array, a Static-Random-Access-Memory (SRAM) array, an Erasable-Programmable-Read-Only-Memory (EPROM) array, and an Electrically-Erasable-Programmable-Read-Only-Memory (E.sup.2PROM) array. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Operating a plurality of Digital-to-Analog-Converters (DAC), each having a current-output signal (O) proportional to its reference input signal (r) and responsive to its digital input code (D); Supplying at least one W code (D.sub.W) to at least one of the plurality of DACs (DAC.sub.W), generating an at least one O signal (W.sub.O); Supplying at least one W port of at least one scalar multiplier in the second NN layer with the at least one W.sub.O signal; Supplying at least one R code (D.sub.R) to at least one other of the plurality of DACs (DACs), generating an at least one R signal (R.sub.O); Supplying at least one R port of at least one other scalar multiplier in the second NN layer with the at least one R.sub.O signal; and Receiving at least one of the D.sub.W and D.sub.R codes from at least one of a Latch array, a Static-Random-Access-Memory (SRAM) array, an Erasable-Programmable-Read-Only-Memory (EPROM) array, and an Electrically-Erasable-Programmable-Read-Only-Memory (E.sup.2PROM) array.
(11) An aspect of the present disclosure is a scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method comprising: Operating a plurality of Digital-to-Analog-Converters (DAC), each having a current-output signal (O) proportional to its reference input signal (r) and responsive to its digital input code (D); Operating a plurality of Reference Bias Networks (RBN), the output of each generating a plurality of reference voltage bus signals (VRi) responsive to binary weighted currents, each binary weighted current proportional to a respective RBN input current signal (I.sub.RBN); Operating a plurality of Tiny DACs (tDAC), the current-output signal (OT) of each being proportional to a plurality of voltage reference signals (REF.sub.1) and responsive to its digital input code (DT); Supplying a S.sub.1 activation input code (D.sub.S1) to a DAC.sub.S1 to generate a O.sub.S1 output signal in a first neural network (NN) layer, wherein the O.sub.S1 signal is substantially the same as the I.sub.RBN of a R.sub.BNS1, and generating a plurality of VRi.sub.S1 reference voltage signals; Supplying a W.sub.1,1 digital input code to a tDAC.sub.1,1 in the first NN layer, wherein the plurality of VRi.sub.S1 reference voltage signals is substantially the same as the REF′ of the tDAC.sub.1,1, and generating a Z.sub.1,1=S.sub.1×W.sub.1,1×1/r scalar multiplication product output signal; Supplying a W.sub.1,2 digital input code to a tDAC.sub.1,2 in the first NN layer, wherein the plurality of VRi.sub.S1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC.sub.1,2, and generating a Z.sub.1,2=S.sub.1×W.sub.1,2×1/r scalar multiplication product output signal; Supplying a S.sub.2 activation input code (D.sub.S2) to a DAC.sub.S2 to generate a O.sub.S2 output signal in the first NN layer, wherein the O.sub.S2 signal is substantially the same as the I.sub.RBN of a RBN.sub.S2, and generating a plurality of VRi.sub.S2 reference voltage signals; Supplying a W.sub.2,1 digital input code to a tDAC.sub.2,1 in the first NN layer, wherein the plurality of VRi.sub.S2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC.sub.2,1, and generating a Z.sub.2,1=S.sub.2×W.sub.2,1×1/r scalar multiplication product output signal; Supplying a W.sub.2,2 digital input code to a tDAC.sub.2,2 in the first NN layer, wherein the plurality of VRi.sub.S2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC.sub.2,2, and generating a Z.sub.2,2=S.sub.2×W.sub.2,2×1/r scalar multiplication product output signal; Summing the Z.sub.1,1 and the Z.sub.2,1 signals to generate a Z.sub.1 signal (Z.sub.1,1+Z.sub.2,1) in the first layer of the NN; Summing the Z.sub.2,1 and the Z.sub.2,2 signals to generate a Z.sub.2 signal (Z.sub.2,1+Z.sub.2,2) in the first layer of the NN; Subtracting a 1 Bias-Offset signal (B.sub.1) from the Z.sub.1 signal to generate a S′, intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.1−B.sub.1; and Subtracting a 2 Bias-Offset signal (B.sub.2) from the Z.sub.2 signal to generate a S′.sub.2 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z.sub.2−B.sub.2. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: Supplying a S′, activation input code (D.sub.S′1) to a DAC.sub.S′1 to generate an O.sub.S′1 output signal in a second neural network (NN) layer, wherein the O.sub.S′1 signal is substantially the same as the I.sub.RBN of a RBN.sub.S′1, and generating a plurality of VRi.sub.S′1 reference voltage signals; Supplying a W′.sub.1,1 digital input code to a tDAC′.sub.1,1 in the second NN layer, wherein the plurality of VRi.sub.S′1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.1,1, and generating a Z.sub.1,1=S.sub.1×1/r scalar multiplication product output signal; Supplying a W′.sub.1,2 digital input code to a tDAC′.sub.1,2 in the second NN layer, wherein the plurality of VRi.sub.S′1 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.1,2, and generating a Z′.sub.1,2=S′.sub.1×W′.sub.1,2×1/r scalar multiplication product output signal; Supplying a S′.sub.2 activation input code (D.sub.S′2) to a DAC.sub.S′2 to generate an O.sub.S′2 output signal in the second NN layer, wherein the O.sub.S′2 signal is substantially the same as the I.sub.RBN of a RBN.sub.S′2, and generating a plurality of VRi.sub.S′2 reference voltage signals; Supplying a W′.sub.2,1 digital input code to a tDAC′.sub.2,1 in the second NN layer, wherein the plurality of VRi.sub.S′2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.2,1, and generating a Z′.sub.2,1=S′.sub.2×W′.sub.2,1×1/r scalar multiplication product output signal; Supplying a W′.sub.2,2 digital input code to a tDAC′.sub.2,2 in the second NN layer, wherein the plurality of VRi.sub.S′2 reference voltage signals is substantially the same as the REF.sub.i of the tDAC′.sub.2,2, and generating a Z′.sub.2,2=S′.sub.2×W′.sub.2,2×1/r scalar multiplication product output signal; Summing the Z′.sub.1,1 and the Z′.sub.2,1 signals to generate a Z′.sub.1 signal (Z′.sub.1,1+Z′.sub.2,1) in the first layer of the NN; Summing the Z′.sub.2,1 and the Z′.sub.2,2 signals to generate a Z′.sub.2 signal (Z′.sub.2,1+Z′.sub.2,2) in the first layer of the NN; Subtracting a 1′ Bias-Offset signal (B′.sub.1) from the Z′.sub.1 signal to generate a S″.sub.1 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z′.sub.1−B′.sub.1; and Subtracting a 2′ Bias-Offset signal (B′.sub.2) from the Z′.sub.2 signal to generate a S″.sub.2 intermediate activation signal, in the first layer of the NN, that is substantially equal to Z′.sub.2−B′.sub.2. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: performing a ReLu function on the S′.sub.1 signal to generate an S′.sub.1R intermediate activation signal substantially equal to ReLu{Z.sub.1−B.sub.1}; and performing a ReLu function on the S′.sub.2 signal to generate an S′.sub.2R intermediate activation signal substantially equal to ReLu{Z.sub.2−B.sub.2}. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: wherein the intermediate activation signal 1R is substantially the same as the I.sub.RBN of a RBN.sub.S′1; and wherein the intermediate activation signal S′.sub.2R is substantially the same as the I.sub.RBN of a RBN.sub.S′2. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: performing a batch normalization function on the S′.sub.1 signal to generate an S′.sub.1N normalized intermediate activation signal; and performing a batch normalization function on the S′.sub.2 signal to generate an S′.sub.1N normalized intermediate activation signal. Another aspect of the present disclosure is the scalar current-mode mixed-signal fully-connected neural-network method in an integrated circuit, the method further comprising: wherein the normalized intermediate activation signal S′.sub.1N is substantially the same as the I.sub.RBN of a RBN.sub.S′1; and wherein the normalized intermediate activation signal S′.sub.2N is substantially the same as the I.sub.RBN of a RBN.sub.S′2.
(12) An aspect of the present disclosure is a programmable statistical distribution of MAC signals method in a mixed-signal fully-connected neural-network in an integrated circuit, the method comprising: Programming at least one of (1) an at least one of a plurality of multi-quadrant weight signals W, (2) an at least one of a plurality of multi-quadrant activation signals S, (3) an at least one of a plurality of multi-quadrant multiplication products signals W×S, and (4) an at least one of a plurality of multi-quadrant multiply-Accumulate signals ΣW×S having a signal distribution profile at key summation nodes that is at least one of (1) a uniform distribution, (2) a specific dynamic range, and (3) a gaussian distribution with an average and a sigma.
DETAILED DESCRIPTION
(13) Numerous embodiments are described in the present application and are presented for illustrative purposes only and is not intended to be exhaustive. The embodiments were chosen and described to explain principles of operation and their practical applications. The present disclosure is not a literal description of all embodiments of the disclosure(s). The described embodiments also are not, and are not intended to be, limiting in any sense. One of ordinary skill in the art will recognize that the disclosed embodiment(s) may be practiced with various modifications and alterations, such as structural, logical, and electrical modifications. For example, the present disclosure is not a listing of features which must necessarily be present in all embodiments. On the contrary, a variety of components are described to illustrate the wide variety of possible embodiments of the present disclosure(s). Although particular features of the disclosed embodiments may be described with reference to one or more particular embodiments and/or drawings, it should be understood that such features are not limited to usage in the one or more particular embodiments or drawings with reference to which they are described, unless expressly specified otherwise. The scope of the disclosure is to be defined by the claims.
(14) Although process (or method) steps may be described or claimed in a particular sequential order, such processes may be configured to work in different orders. In other words, any sequence or order of steps that may be explicitly described or claimed does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order possible. Further, some steps may be performed simultaneously despite being described or implied as occurring non-simultaneously (e.g., because one step is described after the other step). Moreover, the illustration of a process by its depiction in a drawing does not imply that the illustrated process is exclusive of other variations and modifications thereto, does not imply that the illustrated process or any of its steps are necessary to the embodiment(s). In addition, although a process may be described as including a plurality of steps, that does not imply that all or any of the steps are essential or required. Various other embodiments within the scope of the described disclosure(s) include other processes that omit some or all of the described steps. In addition, although a circuit may be described as including a plurality of components, aspects, steps, qualities, characteristics and/or features, that does not indicate that any or all of the plurality are essential or required. Various other embodiments may include other circuit elements or limitations that omit some or all of the described plurality.
(15) Be mindful that all the figures comprised of circuits, blocks, or systems illustrated in this disclosure are powered up by positive power supply V.sub.DD and negative power supply V.sub.SS, wherein V.sub.SS can be connected to the ground potential or zero volts. Terms Deep Neural Networks (DNN), Fully Connected Artificial Neural networks (ANN), Neural Networks composed of series of fully connected layers, and Dense Neural Networks (DNN) can be used interchangeably. MAC stands for Multiply-Accumulate. SRAM is static random-access memory. DRAM is dynamic random-access memory. EPROM is erasable programmable read only memory. E.sup.2PROM is electrically EPROM. CIM stands for Compute-In-Memory, IMC stands for In-Memory-Compute, CNM stands for Compute-Near-Memory, and NMC stands for Near-Memory-Compute. The term PSR stands for power supply desensitization circuit. MSB is Most-Significant-Bit, LSB is Least-Significant-Bit, LSP is Least-Significant-Portion (e.g., a digital word not including its MSB). Terms FET is Field-Effect-Transistor; MOS is Metal-Oxide-Semiconductor; MOSFET is MOS FET; PMOS is P-channel or P-type MOS; NMOS is N-channel or N-type MOS; BiCMOS is Bipolar CMOS. The term BJT is Bipolar-Junction Transistor. The terms ‘port’ or ‘terminal’ or ‘node’ are used interchangeably throughout this disclosure. The terms ‘power supply voltage’ or ‘supply voltage’ are used interchangeably throughout this disclosure. The body terminal of NMOSFET can be connected to its source terminal of NMOSFET or to V.sub.SS, throughout this disclosure. Also, the body terminal of PMOSFET can be connected to the source terminal of PMOSFET or to V.sub.DD throughout this disclosure. The term VGS or V.sub.gs are gate-to-source voltage for a MOSFET. The term VDS is drain-to-source voltage for a MOSFET. The term IDS or ID is drain current of a MOSFET (e.g., also I.sub.M1, I.sub.dM1, or I.sub.DM1 are drain current of M.sub.1 that is a MOSFET). The term V.sub.BE or V.sub.BE is base-to-emitter voltage of a BJT. The term I.sub.C is Collector Current of a BJT and I.sub.E is emitter current of a BJT (e.g., also I.sub.Q, I.sub.cQ1, I.sub.cq1 or I.sub.CEq1 or I.sub.CEQ1 is a current of Qi that is a BJT). The term I.sub.N, I.sub.P, I.sub.M is drain current of a p-channel or n-channel MOSFET. Channel width over channel length is W/L which is the size of a MOSFET. This disclosure utilizes BJT or subthreshold MOS transistors (T) whose input-voltage (V.sub.1) to output-current (i.sub.O) transfer function approximately follows an exponential profile.
(16) Keep in mind that for descriptive clarity, illustrations of this disclosure may be simplified, and their improvements beyond simple illustrations would be obvious to one skilled in the art. For example, it would be obvious for one skilled in the art that MOSFET current sources can be cascoded for higher output impedance and lower sensitivity to power supply variations, whereas throughout this disclosure current sources may be depicted with a single MOSFET for clarity of illustration. It would also be obvious to one skilled in the art that a circuit schematic illustrated in this disclosure may be arranged with NMOS transistors or arranged in a complementary version utilizing transistors such as PMOS.
(17) The MOSFETs that operate in the subthreshold region follow an approximate exponential v.sub.I to i.sub.O transfer function that can approximately be represented as follows: i.sub.D≈
(18)
or
(19)
where for a MOSFET, the V.sub.TH is threshold voltage, v.sub.GS is voltage between gate-terminal to source-terminal, i.sub.D is current through the drain terminal,
(20)
is a channel-width over channel-length ratio, V.sub.t is thermal voltage, n is slope factor, I.sub.DO is the characteristics current when v.sub.GS≈V.sub.TH. Note that in the case of a MOSFET operating in subthreshold, v.sub.I corresponds to V.sub.GS and i.sub.O corresponds to i.sub.D or i.sub.DS. Moreover, note that for two equally sized and same type subthreshold MOSFET the approximate relationship
(21)
holds, wherein v.sub.GS1 and v.sub.GS2 are the first and second MOSFET's v.sub.GSs or v.sub.Is, and i.sub.D1, I.sub.D2 are the first and second MOSFET's i.sub.Ds or i.sub.Os. Note that throughout this disclosure, MOSFETs that operate in subthreshold (which are utilized as the core four MOSFETs in current subthreshold MOSFET-based multipliers) have equal W/L, unless otherwise specified.
(22) A bipolar-junction-transistor (BJT) follows an approximate exponential v.sub.I to i.sub.O transfer function that can be represented as follows:
(23)
or
(24)
where for a BJT, i.sub.E is the emitter current, v.sub.BE is the base-emitter voltage, V.sub.t is thermal voltage, I.sub.ES is the reverse saturation current of the base-emitter diode. In the case of a BJT, v.sub.I corresponds to V.sub.BE and i.sub.O corresponds to i.sub.E or I.sub.C. Moreover, keep in mind that for two equally sized emitter area and same type BJTs
(25)
holds where v.sub.BE1, v.sub.BE2 are the first and second BJT's v.sub.BEs or v.sub.Is, and i.sub.E1, I.sub.E2 are the first and second BJT's i.sub.Es or i.sub.Os. Be mindful that throughout this disclosure, parasitic vertical substrate BJTs (which are utilized as the core four BJTs in current multipliers) have equal emitter area, unless otherwise specified.
(26) Keep in mind that other manufacturing technologies, such as FINFET, Bipolar, BiCMOS, junction field effect transistors (JFET), Gallium Arsenide, Gallium Nitride, Dielectric Isolation, Silicon on Sapphire, Silicon on Oxide, and others can utilize this disclosure in whole or part.
(27) The illustrated circuit schematics of embodiments described in the proceeding sections may have some of the following benefits, some of which are outlined here to avoid repetition in each forthcoming section in the interest of clarity and brevity:
(28) First, in current-mode signal processing because voltage swings are small (all else equal) the disclosed mixed-signal current-mode circuits are inherently faster compared to voltage mode-signal processing. As such the disclosed circuits can provide a choice of trade-off and flexibility between running at moderate speeds and operating with low currents to save on power consumption. Moreover, smaller voltage swings along the current-mode signal paths enables operations at lower V.sub.DD and V.sub.SS which helps with lower power consumption. Also, because voltage swings are small in current-mode signal processing, the disclosed circuits can enable internal analog signals to span between full-scale and zero-scale (i.e., full-scale dynamic range) while being less restricted by power supply voltage V.sub.DD and V.sub.SS levels, as compared with voltage-mode signal processing.
(29) Second, the disclosed mixed-signal current-mode circuit designs (such as that of the weight data-converters) can be arranged on a silicon die right next to memory such as SRAM to facilitate Compute-In-Memory (CIM) or Near-Memory-Compute (NMC) operation. Such an arrangement can lower overall dynamic power consumption associated with the read/write cycles into an out of memory.
(30) Third, generally mixed-mode and analog MACs require plurality of ADCs and memory (e.g., SRAM) as well as DACs for store/process intermediate activation signals through hidden layers of a neural network. The disclosed mixed-signal current-mode circuit designs eliminate the need for intermediate ADC. It also avoids the need for respective memory or register functions that store the intermediate or partial digital summations as well as the subsequent respective intermediate Digital-to-Analog Converter (DAC). Such a trait has the additional benefit of saving digital memory area and reducing the dynamic power consumption associated with read-write cycles into and out of memory related to intermediate activation signals.
(31) Fourth, the disclosed mixed-signal circuit designs operating in current-mode, facilitates simple, low cost, and fast summation and or subtraction functions. For example, summation of plurality of analog currents could be accomplished by simply coupling plurality of current signals. Depending on accuracy and speed requirements, subtraction of analog current signals could be accomplished by utilizing a current mirror where the two analog current signals are applied to the opposite side of the current mirror, for example.
(32) Fifth, majority of the disclosed mixed-signal current-mode circuits, can operate with low power supply voltages since their operating headroom can be generally limited to a FET's V.sub.GS+V.sub.DS. Moreover, operating some of the MOSFETs of the disclosed circuits in the subthreshold region enables the disclosed mixed-signal current-mode circuits to operate with ultra-low currents, even lower supply voltage levels, and ultra-low power consumption.
(33) Sixth, for some data-sets of some machine learning applications, the population and or combinations of activation and weight signals can be programmable and/or their combination can be uniformly distributed by being trained to a specific dynamic range at key summation node (e.g., Gaussian distribution with an average and a sigma). Some of the disclosed circuits are arranged for such statistical distribution profile of summation signals such that, for example, MAC summation node's signal swings would follow a more defined and bounded profile along signal processing paths, which avoid over-flow conditions, can reduce power consumption and improve speed attributed to such signal paths and or summation nodes/wires.
(34) Seventh, some of the disclosed mixed-signal current-mode circuits can perform ‘scalar multiplications’ by arranging iMULT and iMAC in neural networks such that a reference bias network (RBN) distributes an activation signal (scalar). As such, a scalar/activation signal can be multiplied with a plurality of weight signals, (arranged with tiny multiplying data-converter) without being constrained by subthreshold operations, thereby enabling full-scale signal swings at summation nodes while reducing power consumption and silicon dies size. Also, some of the other disclosed mixed-signal current-mode circuits can perform ‘scalar multiplications’ by utilizing MOSFETs that operate in subthreshold. In such subthreshold MOSFET-based scalar multiplier, activation signal (scalar) circuitry and reference signal circuitry are shared with a plurality of weight input circuitries and product outputs circuitries which saves silicon area, reduces current consumptions, and improves matching between product outputs. Furthermore, some of the other disclosed mixed-signal current-mode circuits can perform ‘scalar multiplications’ by arranging parasitic substrate vertical BJT (in standard CMOS) that are not constrained by subthreshold operations. Such parasitic substrate vBJT-based multiplier, enable full-scale signal swings at summation nodes, wherein an activation signal (scalar) and reference signal circuitry are shared with a plurality of weight inputs circuitries and product outputs circuitries. Such arrangements also save silicon area, reduces current consumptions, and improves matching between product outputs, in an area and current efficient manner.
(35) Eight, the disclosed mixed-signal current-mode circuits do not require capacitors nor resistors, which reduces die size and die cost, and facilitates their fabrication in main-stream and readily available standard digital CMOS that is low cost and suitable for rugged high-quality high-volume mass production.
(36) Ninth, the disclosed mixed-signal current-mode circuits are free of clock, suitable for asynchronous (clock free) computation. As such, there is no clock related noise on power supplies and there is no dynamic power consumption due to a digital logic.
(37) Tenth, some of the disclosed mixed-signal current-mode circuits are arranged in a symmetric, matched, and scaled manner. This trait facilitates devices parameters to track each other over process, temperature, and operating condition variations. Accordingly, the disclosed circuit's temperature coefficient and power supply rejection performance can be enhanced.
(38) Eleventh, the disclosed mixed-signal current-mode circuits enable “always-on” operation wherein for a meaningful portion of the computation circuitry to shut itself off (i.e., ‘smart self-power-down’) in the face of no incoming signal. For example, where a zero (scalar) activation input signal is supplied to a RBN (in a multiplying DAC arrangements) or when a zero-weight signal is supplied to input of subthreshold CMOS multiplier or a vBJT multiplier, the output of the respective multipliers generate zero current and consume zero-power.
(39) Twelves, digital computation is generally accurate but it may be excessively power hungry. Current-mode analog and mixed-signal computation that is disclosed here leverage the trade off in analog signal processing between low power and analog accuracy in form of signal degradation, but not total failures. This trait can provide the AI & ML end-application with approximate results to work with instead of experiencing failed results.
(40) Thirteenth, to reduce sensitivity to power supply variations, the disclosed mixed-signal current-mode circuits can be arranged with power supply desensitization circuits that regulates a central reference current or a RBN output signal before supplying the raw reference signal to weight data converters. Such an arrangement, enables keeping the weight data-converters small which saves on silicon die area. Alternatively, cascading current sources of weight data-converters can help increase output impedance and reduce sensitivity of output currents of data-converters to power supply variations.
(41) Fourteenth, the disclosed MAC ICs operating in current-mode (having smaller voltage swings along the signal computation path) are less constrained by analog signal overflow constraints at summation nodes as compared to voltage mode circuits wherein summation of MAC voltages is generally bound by power supply voltage V.sub.DD and V.sub.SS levels.
(42) Fifteenth, the disclosed mixed-signal voltage-mode or current-mode circuit designs can be manufactured on low-cost standard and conventional Complementary-Metal-Oxide-Semiconductor (CMOS) fabrication, which are more mature, readily available, and process node portable than “bleeding-edge” technologies, thereby facilitating embodiments of ICs having relatively more rugged reliability, multi-source manufacturing flexibility, and lower manufacturing cost.
(43) Sixteenth performance of some of the disclosed mixed-signal current-mode circuit embodiments can be arranged to be independent of resistors and capacitor values and their normal variations in manufacturing. As such, manufacturing die yield can perform to specifications mostly independent of passive resistor or capacitor values and their respective manufacturing variations, which could otherwise reduce die yield and increase cost.
(44) Seventeenth, the disclosed MAC ICs operating in current-mode wherein the digital weight signals are converted to analog currents (utilizing the RBN circuit) that saves substantial die area. The RBN decouples the need for individually weighted current sources to be constructed from individually scaled current sources. The RBN feeds a group of binary weighted currents onto a respective group of equally 1X scaled (diode connected) FETs which generate a ‘bias voltage bus’ which are plurality of reference voltage signals. The ‘bias voltage bus’ can then be supplied to sea of tiny data-converters, whose tiny reference network can be programmed with binary weights (without being binarily scaled) by tapping into the equally 1X scaled diode connected FETs of the RBN (bias voltage bus). Accordingly, scalar multiplications can be performed while saving on silicon die area and improving the dynamic response of (sea of tiny) data-converters whose binary weighted current reference network is comprised of tiny equally sized 1X current switches instead of large binary-sized current switches. Such an arrangement also saves substantial die area, power consumption, enhances speed, and improves matching between scaled multiplication (product) outputs.
(45) Eighteenth, some of the disclosed mixed-signal current-mode circuits can perform the ReLu function by supplying summation signals across a circuit that functions like a diode, which save on silicon area and improves speed. For example, in some of the disclosed circuits, the activation inputs to scalar multiplications circuits are diode-connected (e.g., reference input port of RBN circuits, subthreshold MOSFET based current-mode multipliers input port, and vBJT based current-mode multipliers input port). Moreover, a current mode data-converter can supply a B signal as Bias Offset current term (Z.sub.O=ΣS.sub.iW.sub.i+B) onto the same diode-connected summation port. Thus, the functions of current-mode summation and ReLu and Bias Offset insertion can be performed in an area efficient manner avoiding complex circuitry.
(46) Nineteenth, the operating current of the disclosed mixed-signal current-mode circuits for ‘approximate computation’ can be dynamically tuned in package level in accordance with end-applications power-speed-accuracy requirements. For example, applications requiring ultra-low power consumption can program the operating currents, speed, and accuracy of MACs to lower levels with certain limits and vice versa.
Section 1—Description of FIG. 1
(47)
(48)
(49) In other words:
S′.sub.1=ReLu{S.sub.1×w.sub.1,1+S.sub.2×w.sub.2,1+S.sub.3×w.sub.3,1−B.sub.1}
S′.sub.2=ReLu{S.sub.1×w.sub.1,2+S.sub.2×w.sub.2,2+S.sub.3×w.sub.3,2−B.sub.2}
S′.sub.3=ReLu{S.sub.1×w.sub.1,3+S.sub.2×w.sub.2,3+S.sub.3×w.sub.3,3−B.sub.3}
(50) Similarly, matrix multiplication attributed to next layer can be represented as follows:
(51)
(52)
(53) In other words:
S″.sub.1=ReLu{S′.sub.1×w′.sub.1,1+S′.sub.2×w′.sub.2,1+S′.sub.3×w′.sub.3,1−B′.sub.1}
S″.sub.2=ReLu{S′.sub.1×w′.sub.1,2+S′.sub.2×w′.sub.2,2+S′.sub.3×w′.sub.3,2−B′.sub.2}
S″.sub.3=ReLu{S′.sub.1×w′.sub.1,3+S′.sub.2×w′.sub.2,3+S′.sub.3×w′.sub.3,3−B′.sub.3}
(54) In the forthcoming sections, a series of disclosed circuits are arranged in analog and mixed-mode to perform the scalar multiplication function (for neural network composed of fully connected layers to represent the matrix multiplication function), which offer combined benefits in silicon die area savings, enhanced matching, faster speeds, and lower power consumption.
Section 2—Description of FIG. 2
(55)
(56) Free from intermediate Analog-to-Digital (ADC) and intermediate activation memory and intermediate activation Digital-to-Analog Converter (DAC), the disclosed simplified embodiment of
(57) For clarity of illustration and description of
(58) It would be obvious to one skilled in the arts that variations of neural networks with more layers, more activation signal, and more weight signals can be arranged utilizing the scalar current-mode mixed-signal fully-connected neural-network method disclosed herein. Please also note that weight code signals can be stored on local memory (e.g., SRAM) right next to weight DACs which would facilitate CIM to save on dynamic power consumption associated with loading the neural network with weight training codes.
(59) An activation signal code Sm programs DAC.sub.S1 to generate a S.sub.1 current output signal that is fed onto R.sub.BNS1. In R.sub.BNS1 of
(60) Notice that the ‘bias voltage bus’ which are plurality of reference voltage signals can be shared with a sea of multiplying tiny weight DACs, and as such the disclosed scalar current-mode mixed-signal fully-connected neural-network method can generate a sea of scalar multiplication products in a manner that is scalable and area efficient. Moreover, because the multiplying tiny DAC's reference networks are comprised of equally sized 1X FETs, they are fast and generate less glitch. Moreover, multiplying tiny DAC's output port is fast because it carries significantly less parasitic switch capacitance when compared to conventional DACs which rely on large binarily sized current switches for their reference networks.
(61) Also, notice that in the face of zero-signals (e.g., zero activation signal and/or zero weight signal), the scalar current-mode mixed-signal fully-connected neural-network method facilitates an inherent ‘self-power down’ (e.g., if either S.sub.1 or W.sub.1,1 signal are zero, then output currents and thus the power consumption of DAC.sub.1,1 is zero).
(62) Similar to the Sm signal path, an activation signal code S.sub.D2 programs DAC.sub.S2 to generate a S.sub.2 current output signal that is fed onto RBN.sub.S2. In RBN.sub.S2 of
(63) Respective scalar multiplication product current signals are coupled together to form accumulation current signals (e.g., Z.sub.1=S.sub.1×W.sub.1,1+S.sub.2×W.sub.1,2; Z.sub.2=S.sub.1×W.sub.2,1+S.sub.2×W.sub.2,2; etc.) in the first layer of neural network of
(64) The respective accumulation current signals (e.g., Z.sub.1, Z.sub.2, etc.,) are then summed with their corresponding ‘offset bias’ current signals (e.g., B.sub.1, B.sub.2, etc., which can also be programmed with an offset bias current mode DAC). Next, a resultant ‘biased accumulation’ current signal (e.g., Z.sub.1−B.sub.1, Z.sub.2−B.sub.2, etc.,) is inputted to a corresponding input port of RBNs (e.g., RBN.sub.S′1, RBN.sub.S′2, etc.,) of the next layer of current-mode neural network, wherein RBN's input port (e.g., diode connected P.sub.S′1, P.sub.S′2, etc.,) can be arranged to perform a ReLu function.
(65) If an accumulation current signal (e.g., Z.sub.1, Z.sub.2, etc.,) is smaller than its respective offset bias current signal (e.g., B.sub.1, B.sub.2, etc.,), then the diode connected FET remains off and zero signal is processed through that RBN.
(66) For example, if Z.sub.1=S.sub.1×W.sub.1,1+S.sub.2×W.sub.1,2<B.sub.1, then FET P.sub.S′1 is off, and zero current is passed through RBN.sub.S′1 and thus zero volt is generated at the ‘bias voltage bus’ of RBN.sub.S′1 (e.g., V.sub.1S′1=V.sub.2S′1=V.sub.4S′1=0 etc.) which subsequently bias the reference current network of its corresponding tiny DACs (e.g., DAC′.sub.1,1; DAC′.sub.2,1; etc.,) to zero.
(67) Similarly, if Z.sub.2=S.sub.1×W.sub.2,1+S.sub.2×W.sub.2,2<B.sub.2, then FET P.sub.S′2 is off, and zero current is passed through RBN.sub.S′2 and thus zero volt is generated at the ‘bias voltage bus’ of RBN.sub.S′2 (e.g., V.sub.1S′2=V.sub.2S′2=V.sub.4S′2=0 etc.) which subsequently bias the reference current network of its corresponding tiny DACs (e.g., DAC′.sub.1,2; DAC′.sub.2,2; etc.,) to zero.
(68) Conversely, if an accumulation current signal (e.g., Z.sub.1, Z.sub.2, etc.,) is greater than its respective offset bias current signal (e.g., B.sub.1, B.sub.2, etc.,), then the diode connected RBN's FET (e.g., P.sub.S′1, P.sub.S′1, etc.,) conducts a difference signal (e.g., etc.,) which is the ‘biased accumulation’ current signal.
(69) For example, if Z.sub.1=S.sub.1×W.sub.1,1+S.sub.2×W.sub.1,2>B.sub.1, then FET P.sub.S′1 is on, and Z.sub.1−B.sub.1 or S.sub.1×W.sub.1,1+S.sub.2×W.sub.1,2−B.sub.1 current is passed through RBN.sub.S′1 that generates binary weighted currents which in turn program the ‘bias voltage bus’ which are plurality of reference voltage signals of RBN.sub.S′1 (e.g., V.sub.S′1; V.sub.2S′1; V.sub.4S′1; etc.) that subsequently bias the reference current network of a group of corresponding multiplying current-mode tiny DACs (e.g., DAC′.sub.1,1; DAC′.sub.2,1; etc.,). Similarly, if Z.sub.2=S.sub.1×W.sub.2,1+S.sub.2×W.sub.2,2>B.sub.2, then FET P.sub.S′2 is on, and Z.sub.2−B.sub.2 or S.sub.1×W.sub.2,1+S.sub.2×W.sub.2,2−B.sub.2 current is passed through RBN.sub.S′2 that generates binary weighted currents which in turn program the ‘bias voltage bus’ which are plurality of reference voltage signals of RBN.sub.S′2 (e.g., V.sub.1S′2; V.sub.2S′2; V.sub.4S′2; etc.) that subsequently bias the reference current network of a group of corresponding multiplying current-mode tiny DACs (e.g., DAC′.sub.1,2; DAC′.sub.2,2; etc.,).
(70) Accordingly, S′.sub.1=ReLu{Z.sub.1−B.sub.1}, and S′.sub.2=ReLu{Z.sub.2−B.sub.2}.
(71) Again, notice that intermediate activation signals S′.sub.1 and S′.sub.2 is not digitalized by ADCs and require no register/memory but instead are inputted as a scalar activation signal in analog current-mode onto the next layer, which saves on silicon area, reduces power consumption and enhances signal processing time.
(72) In the second layer of the fully connected neural network of
(73) Coupling the respective scalar multiplication products would generate Z′.sub.1=S′.sub.1×W′.sub.1,1+S′.sub.2×W′.sub.1,2, and Z′.sub.2=S′.sub.1×W′.sub.2,1+S′.sub.2×W′.sub.2,2 which are offset biased by B′.sub.1 and B′.sub.2 to generate S″.sub.1=ReLu{Z′.sub.1−B′.sub.1} and S″.sub.2=ReLu{Z′.sub.2−B′.sub.2} when inputted to the diode connected input port of the RBN of the next layer of neural network.
(74) It would be obvious to one skilled in the art that other functions such as a normalization DAC to scale and normalize accumulation current signals can be utilized along summing node signal paths, for example. Moreover, it would be obvious to one skilled in the art to improve performance by adding such functions as a current buffer to provide capacitive buffering at summation nodes or arrange ReLu functions with circuits that have a sharper turn on-off edge (instead of single diode connected FET at the input of RBNs, for example) and/or utilizing current clamp circuits to control the maximum amplitude of activation signals at intermediate layers, if required by some applications.
(75) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.
Section 3—Description of FIG. 3
(76)
(77) The disclosed simplified circuit illustrated in
(78) In simplified embodiment illustrated in
(79) For clarity of illustration, the sDAC of the simplified embodiment of
(80) Note that other digital word formats such as two's complement and other bit-width (e.g., greater than 4-bit wide) can be arranged for activation and weight digital signal words.
(81) Next, the S signal is then inputted onto the RBN circuit which generates a ‘bias voltage bus’ which are plurality of reference voltage signals comprising V.sub.1S, V.sub.2S, and V.sub.4S. The ‘bias voltage bus’ which are plurality of reference voltage signals is then supplied onto the wDAC whose output (S×W=s.Math.w) is proportional with the S signal (via the ‘bias voltage bus’) and in accordance with a weight digital signal also in form of sign-magnitude 4-bit wide digital word comprised of a MSB weight bit W.sub.MSB and the rest of the bits as a W.sub.LSP (Leas-Significant-Portion of the weight word) that program the wDAC.
(82) Bear in mind that the ‘bias voltage bus’ which are plurality of reference voltage signals comprising V.sub.1S, V.sub.2S, and V.sub.4S and representing the S signal (e.g., activation signal as a scalar multiplier) can be shared amongst the reference networks of plurality of weight DACs to perform plurality of scalar multiplications. Such an arrangement can saves on silicon area, improve dynamic response, and improve matching between multiplication products, and be utilized in an arrangement similar to the depiction in the simplified embodiment of
(83) In accordance with operating an XNOR function on S.sub.MSB and W.sub.MSB, the s.Math.w signal is passed through either a N+ or a N− transistors to generate a multi-quadrant multiplication product current signal (±s.Math.w) which are fed onto a current mirror comprising an amplifier (A), a P+ or a P−, and P.sub.O transistors to facilitate multi-quadrant MAC and ReLu operation, as part of an ADD+ReLu functional block illustrated in the simplified embodiment of
(84) A bias-offset Digital-to-Analog Converter (bDAC) injects a bias-offset current (b) that is proportional with a bias reference current signal (IRB) and in accordance with a bias digital word signal (B).
(85) If the ±s.Math.w<b, then P.sub.O transistor shuts off and its current signal trends towards zero which is fed onto as an intermediate activation signal of next layer (e.g., S′=0). If the ±s.Math.w>b, then P.sub.O transistor current signal trends towards ±s.Math.w−b, which is then passed onto the next RBN in the next layer of neural network as S′=ReLu{±s.Math.w−b}.
(86) Also keep in mind that for performing a multi-quadrant scalar MAC operation, a plurality of ±s.Math.w signals can share the same Add+ReLu functional block similar to what is depicted in the simplified embodiment of
(87) Please also note that intermediate activation signals S′.sub.1 does not require being digitalized by an ADC and require no register/memory but instead it can be inputted as a scalar activation signal in analog current-mode onto the RBN of a next layer, which saves on silicon area, reduces power consumption and enhances signal processing time.
(88) A disclosed method herein of programming weight signals (i.e., w) and/or activation signals (i.e., s) and/or their combination can be uniformly distributed by being trained to a specific dynamic range at key summation node (e.g., Gaussian distribution with an average and a sigma) would provide a boundary around MAC signals (i.e., ΣS.sub.1.Math.W.sub.1 or ±ΣS.sub.1.Math.W.sub.1) at corresponding summation nodes or through corresponding summation wires. MAC summation signals (at node or through wires) with programmable swings can provide the benefit of lower power consumption and enhanced dynamic performance.
(89) The simplified embodiment in
(90)
(91) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.
Section 4—Description of FIG. 4
(92)
(93) Free from intermediate Analog-to-Digital (ADC) and intermediate activation memory and intermediate activation Digital-to-Analog Converter (DAC), the disclosed simplified embodiment of
(94) In the simplified neural network embodiment illustrated in
(95) The simplified neural network depiction of
(96) Notice that depending on the end-application requirements, the simplified neural network embodiment of
(97) Also, keep in mind that for CIM like operations, SRAMs that store the weight training codes, for example, can be placed on the silicon die right next to Weight, Bias-Offset, and Multiplication Reference DACs which reduces dynamic power consumption associated with digital data write cycles.
(98) A RBN.sub.R generates a ‘bias voltage bus’ which are plurality of reference voltage signals comprising V.sub.R1, V.sub.R2, and V.sub.R4, wherein RBN.sub.R's operating current is programmed by digital word (r) of an RBN DAC (D/A: I.sub.R). Accordingly, the V.sub.R1, V.sub.R2, and V.sub.R4 are shared amongst a plurality of reference network of a plurality of tiny Weight, Activation, Multiplication Reference, and Bias-Offset DACs.
(99) A digital word S.sub.1, representing an activation signal (which is also a scalar signal), programs D/A: S.sub.1 to generate a scalar current signal that is inputted onto a transistor MY.
(100) A digital word C.sub.1, representing a Multiplication Reference Signal, programs D/A: C.sub.1 to generate a current signal that is inputted onto diode connected transistor M.sub.R.
(101) A digital word W.sub.1,1, representing a weight signal, programs D/A: W.sub.1,1 to generate a current signal that is inputted onto a transistor M.sub.X1.
(102) A digital word W.sub.1,2, representing a weight signal, programs D/A: W.sub.1,2 to generate a current signal that is inputted onto a transistor M.sub.X2.
(103) The transistors M.sub.X1 and M.sub.X2 as well as the M.sub.Z1 and M.sub.Z2 are parts of a scalar subthreshold MOSFET-based multiplier X.sub.S1. The M.sub.R and My generate signals which are shared with M.sub.X1 and M.sub.X2 as well as M.sub.Z1 and M.sub.Z2 to perform a plurality (e.g., pair) of scalar multiplication functions in accordance with the following approximate equations:
(104) Because the FETs in X.sub.S1 operate under the subthreshold region, and for the loops comprising M.sub.R, M.sub.Y, M.sub.A, M.sub.Z1, and M.sub.X1 and M.sub.R, M.sub.Y, M.sub.A, M.sub.Z2, and M.sub.X2 the following equations are applicable by the operation of the Kirchhoff Voltage Law (KVL):
(105)
Therefore,
(106)
and I.sub.Z1≈(I.sub.X1×I.sub.Y)/I.sub.R. Programming I.sub.R=1, and substituting for I.sub.X1=w.sub.1,1 and I.sub.Y=s.sub.1, then scalar multiplication product I.sub.Z1≈w.sub.1,1×s.sub.1 is generated.
(107) Similarly,
(108)
Therefore,
(109)
and I.sub.Z2≈(I.sub.X2×I.sub.Y)/I.sub.R. Programming I.sub.R=1, and substituting for I.sub.X2=w.sub.1,2 and I.sub.Y=s.sub.1, then scalar multiplication product I.sub.Z2≈w.sub.1,2×s.sub.1 is generated.
(110) Keep in mind that transistor M.sub.A functions like an inverting amplifier to regulate and supply the needed current to node V.sub.S1 for KVL to hold. Also, notice that only one voltage node (e.g., one wire carrying V.sub.S1) is shared with plurality of Weight (e.g., M.sub.X1, M.sub.X2) FETs and plurality of Multiplication Product (e.g., M.sub.Z1, M.sub.Z2) FETs in the scalar subthreshold MOSFET-based multiplier of
(111) Note that at the expense of extra circuitry and current consumption, independent multiplication reference signals can be programmed to perform batch normalization for plurality of MACs (e.g., programming I.sub.R to different values in I.sub.Z≈(I.sub.X×I.sub.Y)/I.sub.R for group of multipliers).
(112) In the simplified neural network embodiment of
(113) The functions of accumulating the multiplication products (i.e., MAC) are performed around X.sub.S1 and X.sub.S2 to respectively generate Z.sub.1=w.sub.1,1×s.sub.1+w.sub.2,1×s.sub.2 and Z.sub.2=w.sub.1,2×s.sub.1+w.sub.2,2×s.sub.2, by coupling wires that carry the current signals w.sub.1,1×s.sub.1 and w.sub.2,1×s.sub.2 and by coupling the wires that carry the current signals w.sub.1,2×s.sub.1 and w.sub.2,2×s.sub.2, respectively.
(114) Next, a digital word B.sub.1, representing a biasing-offset signal, programs D/A: B.sub.1 to generate a current signal that is summed with Z.sub.1.
(115) A digital word B.sub.2, representing a biasing-offset signal, programs D/A: B.sub.2 to generate a current signal that is summed with Z.sub.2.
(116) As such, the first layer of neural network generates an intermediate activation analog current signal S′.sub.1=Z.sub.1−B.sub.1=w.sub.1,1 w.sub.2,1×s.sub.2−B.sub.1 that is inputted to a next layer of neural network's scalar subthreshold MOSFET-based multiplier X.sub.S′1.
(117) Similarly, the first layer of neural network generates an intermediate activation analog current signal S′.sub.2=Z.sub.2−B.sub.2=w.sub.1,2×s.sub.1+w.sub.2,2×s.sub.2−B.sub.2 that is inputted to a next layer of neural network's scalar subthreshold MOSFET-based multiplier X.sub.S′2.
(118) Please notice the following arrangements in the next layer, similar to the first layer of the simplified neural network embodiment of
(119) The S′.sub.1 intermediate activation analog current signal is inputted into transistor M′Y of scalar multiplier X.sub.S′1. Please notice that in the simplified embodiment of
(120) In the simplified neural network of
(121) Next, a digital word C′.sub.1 representing a Multiplication Reference Signal, programs D/A: C′.sub.1 to generate a current signal that is inputted onto diode connected transistor M′R of scalar multiplier X.sub.S′1.
(122) A digital word W′.sub.1,1 representing a weight signal, programs D/A: W′.sub.1,1 to generate a current signal that is inputted onto a transistor M′.sub.X1 of scalar multiplier X.sub.S′1.
(123) A digital word W′.sub.1,2 representing a weight signal, programs D/A: W′.sub.1,2 to generate a current signal that is inputted onto a transistor M′.sub.X2 of scalar multiplier X.sub.S′1.
(124) In the simplified neural network embodiment of
(125) Likewise, the S′.sub.2 intermediate activation analog current signal is fed onto scalar multiplier X.sub.S′2.
(126) A digital word C′.sub.2 representing a Multiplication Reference Signal, programs D/A: C′.sub.2 to generate a current signal that is fed onto scalar multiplier X.sub.S′2.
(127) A digital word W′.sub.2,1 representing a weight signal, programs D/A: W′.sub.2,1 to generate a current signal that is fed onto scalar multiplier X.sub.S′2.
(128) A digital word W′.sub.2,2 representing a weight signal, programs D/A: W′.sub.2,2 to generate a current signal that is fed onto scalar multiplier X.sub.S′2.
(129) Therefore, the following scalar multiplication products W.sub.2,1×s′.sub.2 and w.sub.2,2×s′.sub.2 are generated by applying similar equations for X.sub.S′2 in the next layer (to that of equations pertaining to the X.sub.S2 of the first layer).
(130) A w′.sub.1,1×s′.sub.1+w.sup.2,1×s′.sub.2=Z′.sub.1 current signal is generated, representing the accumulation of a group (e.g., a couple) of scalar multiplication products (for communications with a subsequent layer of the neural network) by coupling the wire carrying the w′.sub.1,1×s′.sub.1 current signal with the wire carrying w′.sub.2,1×s′.sub.2.
(131) A W.sub.1,2×s′.sub.1+w.sub.2,2×s′.sub.2=Z′.sub.2 current signal is generated, representing the accumulation of a group (e.g., a couple) of scalar multiplication products (for communications with the subsequent layer of the neural network) by coupling the wire carrying the W.sub.1,2×s′.sub.1 current signal with the wire carrying w′.sub.2,2×s′.sub.2.
(132) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.
Section 5—Description of FIG. 5
(133)
(134) Also, free from intermediate Analog-to-Digital (ADC) and intermediate activation memory and intermediate activation Digital-to-Analog Converter (iDAC), the disclosed simplified embodiment of
(135) In the simplified neural network embodiment illustrated in
(136) The simplified neural network depiction of
(137) Notice that depending on the end-application requirements, the simplified neural network embodiment of
(138) Also, keep in mind that for CIM like operations, SRAMs (not shown for clarity of illustration) that store the weight training codes, for example, can be placed on the silicon die right next to Weight, Bias-Offset, and Multiplication Reference DACs which reduces dynamic power consumption associated with digital data write cycles.
(139) A RBN.sub.R (not shown for clarity of illustration) can be arranged to generates a ‘bias voltage bus’ which are plurality of reference voltage signals that for example can be shared amongst a plurality of reference network of a plurality of tiny Weight, Activation, Multiplication Reference, and Bias-Offset DACs.
(140) A digital word S.sub.1, representing an activation signal (which is also a scalar signal), programs d/a: S.sub.1 to generate a scalar current signal that is inputted onto a transistor N.sub.Y, and generating a voltage signal V.sub.S1.
(141) A digital word C.sub.1, representing a Multiplication Reference Signal, programs d/a: C.sub.1 to generate a current signal that is inputted onto diode connected transistor N.sub.R, and generating a voltage signal V.sub.C1.
(142) A digital word W.sub.1,1, representing a weight signal, programs d/a: W.sub.1,1 to generate a current signal that is inputted onto a transistor N.sub.X1 whose gate port is coupled with V.sub.C1.
(143) A digital word W.sub.1,2, representing a weight signal, programs d/a: W.sub.1,2 to generate a current signal that is inputted onto a transistor N.sub.X2 whose gate port is also coupled with V.sub.C1.
(144) Gate ports of transistors N.sub.Z1 and N.sub.Z2 are coupled with V.sub.S1.
(145) The transistors N.sub.X1 and N.sub.X2 as well as the N.sub.Z1 and N.sub.Z2 are parts of a scalar subthreshold MOSFET-based multiplier X.sub.1,1 and X.sub.1,2 As noted earlier, the N.sub.R and N.sub.Y transistors generate gate voltage signals which are respectively coupled to gate ports of N.sub.X1-N.sub.X2, and N.sub.Z1-N.sub.Z2 to perform a plurality (e.g., pair) of scalar multiplication functions in accordance with the following approximate equations:
(146) Because the FETs operate under the subthreshold region, and for the loops comprising N.sub.R, N.sub.Y, N.sub.A1, N.sub.Z1, and M.sub.X1 and N.sub.R, N.sub.Y, N.sub.A2, N.sub.Z2, and N.sub.X2 the following equations are applicable by the operation of the Kirchhoff Voltage Law (KVL):
(147)
Therefore,
(148)
and I.sub.Z1≈(I.sub.X1×I.sub.Y)/I.sub.R. Programming I.sub.R=1, and substituting for I.sub.X1=w.sub.1,1 and I.sub.Y=s.sub.1, then scalar multiplication product I.sub.Z1≈w.sub.1,1×s is generated.
(149) Similarly,
(150)
Therefore,
(151)
and I.sub.Z2≈(I.sub.X2×I.sub.Y)/I.sub.R. Programming I.sub.R=1, and substituting for I.sub.X2=w.sub.1,2 and I.sub.Y=s.sub.1, then scalar multiplication product I.sub.Z2≈w.sub.1,2×s.sub.1 is generated.
(152) Note that at the expense of extra circuitry and current consumption, independent multiplication reference signals can be programmed to perform batch normalization for plurality of MACs (e.g., programming I.sub.R to different values in I.sub.Z≈(I.sub.X×I.sub.Y)/I.sub.R for group of multipliers).
(153) Keep in mind that for KVL to hold, transistor N.sub.A1 and N.sub.A2 functions like an inverting amplifier to regulate and supply the needed current to N.sub.X1-N.sub.Z1 and N.sub.X2-N.sub.Z2, respectively. Also, notice that two voltage nodes (e.g., two wires carrying V.sub.C1 and V.sub.S1) are shared with plurality of Weight (e.g., N.sub.X1, N.sub.X2) FETs and plurality of Multiplication Product (e.g., N.sub.Z1, N.sub.Z2) FETs in the scalar subthreshold MOSFET-based multipliers of
(154) In the simplified neural network embodiment of
(155) The functions of accumulating the multiplication products (i.e., MAC) are performed via feeding V.sub.C1-V.sub.S1 onto X.sub.1,1-X.sub.1,2 and feeding V.sub.C2-V.sub.S2 onto X.sub.2,1-X.sub.2,2 to respectively generate Z.sub.1=w.sub.1,1×s.sub.1+w.sub.2,1×s.sub.2 and Z.sub.2=w.sub.1,2×s.sub.1+w.sub.2,2×s.sub.2, by coupling wires that carry the current signals w.sub.1,1×s.sub.1 and w.sub.2,1×s.sub.2 and by coupling the wires that carry the current signals w.sub.1,2×s.sub.1 and w.sub.2,2×s.sub.2, respectively.
(156) Next, a digital word B.sub.1, representing a biasing-offset signal, programs d/a: B.sub.1 to generate a current signal that is summed with Z.sub.1.
(157) A digital word B.sub.2, representing a biasing-offset signal, programs d/a: B.sub.2 to generate a current signal that is summed with Z.sub.2.
(158) As such, the first layer of neural network generates an intermediate activation analog current signal S′.sub.1=Z.sub.1−B.sub.1=w.sub.1,1×s.sub.1+w.sub.2,1×s.sub.2−B.sub.1 that is inputted to a next layer of neural network's scalar subthreshold MOSFET-based multiplier.
(159) In the simplified neural network of
(160) Similarly, the first layer of neural network generates an intermediate activation analog current signal S′.sub.2=Z.sub.2−B.sub.2=w.sub.1,2×s.sub.1+w.sub.2,2×s.sub.2−B.sub.2 that is inputted to a next layer of neural network's scalar subthreshold MOSFET-based multiplier.
(161) The description of arrangements in the next layer is similar to the first layer of the simplified neural network embodiment of
(162) The S′.sub.1 intermediate activation analog current signal is inputted into a p-channel transistor P′.sub.Y, which generates a gate voltage of V.sub.S′1. Please notice that in the simplified embodiment of
(163) Next, a digital word C′.sub.1 representing a Multiplication Reference Signal, programs d/a: C′.sub.1 to generate a current signal that is inputted onto diode connected transistor P′.sub.R, which generates a gate voltage of V.sub.C′1.
(164) A digital word W′.sub.1,1 representing a weight signal, programs d/a: W′.sub.1,1 to generate a current signal that is inputted onto a transistor P′.sub.X1 (of X′.sub.1,1) whose gate port is coupled with V.sub.C′1.
(165) A digital word W′.sub.1,2 representing a weight signal, programs d/a: W′.sub.1,2 to generate a current signal that is inputted onto a transistor P′.sub.X2 (of X′.sub.1,1) whose gate port is also coupled with V.sub.C1.
(166) In the simplified neural network embodiment of
(167) Likewise, the S′.sub.2 intermediate activation analog current signal is fed onto a diode connected transistor N.sub.Y′1 of a scalar multiplier that generates V.sub.S′2.
(168) A digital word C′.sub.2 representing a Multiplication Reference Signal, programs d/a: C′.sub.2 to generate a current signal that is fed onto a diode connected transistor N.sub.R′1 of a scalar multiplier that generates V.sub.C′2.
(169) A digital word W′.sub.2,1 representing a weight signal, programs d/a: W′.sub.2,1 to generate a current signal that is fed onto X′.sub.2,1.
(170) A digital word W′.sub.2,2 representing a weight signal, programs d/a: W′.sub.2,2 to generate a current signal that is fed onto X′.sub.2,2.
(171) In the simplified neural network embodiment of
(172) A w′.sub.1,1×s′.sub.1+w′.sub.2,1×s′.sub.2=Z′.sub.1 current signal is generated, representing the accumulation of a group (e.g., a couple) of scalar multiplication products (for communications with a subsequent layer of the neural network) by coupling the wire carrying the w′.sub.1,1×s′.sub.1 current signal with the wire carrying w′.sub.2,1×s′.sub.2.
(173) A w.sub.1,2×s′.sub.1+w′.sub.2,2×s′.sub.2=Z′.sub.2 current signal is generated, representing the accumulation of a group (e.g., a couple) of scalar multiplication products (for communications with the subsequent layer of the neural network) by coupling the wire carrying the w′.sub.1,2×s′.sub.1 current signal with the wire carrying w′.sub.2,2×s′.sub.2.
(174) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.
Section 6—Description of FIG. 6
(175)
(176) The disclosed simplified embodiment of
(177) The neural network depiction of
(178) Keep in mind that depending on the end-application requirements, the simplified neural network embodiment of
(179) Also, keep in mind that for CIM like operations, SRAMs (not shown for clarity of illustration) that for example store the weight training, bias-offset, reference multiplication codes can be placed on the silicon die right next to Weight (e.g., d/a: w.sub.1,1), Bias-Offset (e.g., da/a: B.sub.1), and Multiplication Reference (e.g., d/a: C.sub.1) DACs which can reduce dynamic power consumption associated with digital data communication cycles.
(180) A RBN.sub.R (not shown for clarity of illustration) can be arranged to generates a ‘bias voltage bus’ which are plurality of reference voltage signals that for example can be shared amongst a plurality of reference network of a plurality of tiny Weight, Activation, Multiplication Reference, and Bias-Offset DACs.
(181) Notice in the embodiment illustrated in
(182) A code S.sub.1, representing an activation signal (which is also a scalar signal), programs d/a: S.sub.1 and in accordance with XNOR operations (e.g., S.sub.1MSB is a sign-bit, and S.sub.1LSP is P-bit wide) to generate a scalar current signal that is inputted onto a transistor N.sub.Y1, and generating a voltage signal V.sub.S1.
(183) A digital word C.sub.1, representing a Multiplication Reference Signal, programs d/a: C.sub.1 to generate a current signal that is inputted onto diode connected transistor N.sub.R1, and generating a voltage signal V.sub.C1.
(184) A code W.sub.1,1 representing a weight signal programs d/a: W.sub.1,1, and in accordance with XNOR operations (e.g., W.sub.1,1MSB is a sign-bit, and W.sub.1,1LSP is Q-bit wide) to generate a current signal that is inputted onto a transistor N.sub.X1 whose gate port is coupled with V.sub.C1.
(185) A digital word W.sub.1,2, representing a weight signal, and in accordance with XNOR operations, programs d/a: W.sub.1,2 to generate a current signal that is inputted onto a transistor N.sub.X2 whose gate port is also coupled with V.sub.C1.
(186) Gate ports of transistors N.sub.Z1 and N.sub.Z2 are coupled with V.sub.S1.
(187) Transistors N.sub.X1 and N.sub.X2 as well as the N.sub.Z1 and N.sub.Z2 are parts of a scalar subthreshold MOSFET-based multiplier comprising X.sub.1,1 and X.sub.1,2 blocks. As noted earlier, the N.sub.R1 and N.sub.Y1 diode connected transistors generate gate voltage signals which are respectively coupled to gate ports of N.sub.X1-N.sub.X2, and N.sub.Z1-N.sub.Z2 to perform a plurality (e.g., pair) of scalar multiplication functions in accordance with the following approximate equations:
(188) Because the FETs operate under the subthreshold region, and for the loops comprising N.sub.R1, N.sub.Y1, N.sub.A1, N.sub.Z1, and M.sub.X1 and N.sub.R1, N.sub.Y1, N.sub.A2, N.sub.Z2, and N.sub.X2 the following equations are applicable by the operation of the Kirchhoff Voltage Law (KVL):
(189)
Therefore,
(190)
and I.sub.Z1≈(I.sub.X1×I.sub.Y1)/I.sub.R1. Programming I.sub.R1=1, and substituting for I.sub.X1=w.sub.1,1 and =s.sub.1, then scalar multiplication product I.sub.Z1≈w.sub.1,1×s.sub.1 is generated.
(191) Similarly,
(192)
Therefore,
(193)
and I.sub.Z2≈(I.sub.X2×I.sub.Y1)/I.sub.R1. Programming I.sub.R1=1, and substituting for I.sub.X2=w.sub.1,2 and I.sub.Y1=s.sub.1, then scalar multiplication product I.sub.Z2≈w.sub.1,2×s.sub.1 is generated.
(194) Note that at the expense of extra circuitry and current consumption, independent multiplication reference signals can be programmed to perform batch normalization for plurality of MACs (e.g., programming I.sub.R to different values in I.sub.Z≈(I.sub.X×I.sub.Y)/I.sub.R for group of multipliers).
(195) Keep in mind that for KVL to hold, transistor N.sub.A1 and N.sub.A2 function as an inverting amplifier to regulate and supply the needed current to N.sub.X1-N.sub.Z1 transistors and N.sub.X2-N.sub.Z2 transistors, respectively. Also, notice that two voltage nodes (e.g., two wires carrying V.sub.C1 and V.sub.S1) are shared with plurality of Weight transistors (e.g., N.sub.X1, N.sub.X2) and plurality of Multiplication Product transistors (e.g., N.sub.Z1, N.sub.Z2) in scalar subthreshold MOSFET-based multipliers of
(196) In the simplified neural network embodiment of
(197) For multi-quadrant operations, the respective multiplication products w.sub.1,1×s.sub.1, w.sub.1,2×s.sub.1, w.sub.2,1×s.sub.2, and w.sub.2,2×s.sub.2 are each fed onto corresponding ‘differential switches’ that are gated by XNOR operations in accordance with respective (sign-bit) MSB of weight and activation codes.
(198) To generate multi-quadrant ±Z.sub.1=w.sub.1,1×s.sub.1+w.sub.2,1×s.sub.2 and multi-quadrant ±Z.sub.2=w.sub.1,2×s.sub.1+w.sub.2,2×s.sub.2, the output of the corresponding ‘differential switches’ are respectively summed together and with the corresponding offset-bias current signals (e.g., d/a: B1 and d/a: B2) and coupled to current mirrors (e.g., P.sub.1, P.sub.1′ and P.sub.2, P.sub.2′) which are part of the 1.sub.SUM,OFFSET,ReLU, and 1.sub.SUM,OFFSET,ReLU circuits.
(199) The ReLu function for S′.sub.1=ReLu{±Z.sub.1−B.sub.1} is performed as follows: When ±Z.sub.1<B.sub.1, then diode connected transistor N.sub.S′1 turns off and the current signal through N.sub.S′1 is zero (voltage V.sub.S′1 that is applied to next layer's scalar subthreshold MOSFET multiplier is zero), otherwise diode connected transistor N.sub.S′1 turns on and the current through N.sub.S′1 is ±Z.sub.1−B.sub.1 which corresponds to a value of V.sub.S′1 as an intermediate activation scalar signal that is inputted to next layer's scalar subthreshold MOSFET multiplier.
(200) Similarly, the ReLu function for S′.sub.2=ReLu{±Z.sub.2−B.sub.2} is performed as follows: When ±Z.sub.2<B.sub.2, then diode connected transistor N.sub.S′1 turns off and the current signal through it is zero (voltage V.sub.S′1 that is applied to next layer's scalar subthreshold MOSFET multiplier is zero), otherwise diode connected transistor N.sub.S′1 turns on and the current through N.sub.S′1 is ±Z.sub.1−B.sub.1 which corresponds to a value of V.sub.S′1 that is inputted to next layer's scalar subthreshold MOSFET multiplier.
(201) Silicon die area is saved by operating in current-mode and matching is improved by sharing same current mirrors (e.g., P.sub.1, P.sub.1′ and P.sub.2, P.sub.2′) to perform summation and subtraction functions by coupling plurality of respective multiplication products (e.g., plurality of W.sub.1.Math.S.sub.1).
(202) The embodiment of the circuit illustrated in
(203) As described earlier, the disclosed method of programming weight signals (i.e., w) and/or activation signals (i.e., s) and/or their combination can be uniformly distributed by being trained to a specific dynamic range at key summation node (e.g., Gaussian distribution with an average and a sigma) would provide a boundary around MAC signals (i.e., ΣS.sub.1.Math.W.sub.1 or ±ΣS.sub.1.Math.W.sub.1) at corresponding summation nodes or through corresponding summation wires.
(204) MAC summation signals (at node or through wires) with programmable swings can provide the benefit of lower power consumption and enhanced dynamic performance.
(205) The simplified embodiment in
(206)
(207) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.
Section 7—Description of FIG. 7
(208)
(209) Also, free from intermediate Analog-to-Digital (ADC) and intermediate activation memory and intermediate activation Digital-to-Analog Converter (iDAC), the disclosed simplified embodiment of
(210) In the simplified neural network embodiment illustrated in
(211) The simplified neural network depiction of
(212) Keep in mind that depending on the end-application requirements, the simplified neural network embodiment of
(213) Also, keep in mind that for CIM like operations, SRAMs (not shown for clarity of illustration) that for example store the weight training, bias-offset, reference multiplication codes can be placed on the silicon die right next to Weight (e.g., d/a: W.sub.1,1), Bias-Offset (e.g., da/a: B.sub.1), and Multiplication Reference (e.g., d/a: DACs which can reduce dynamic power consumption associated with digital data communication cycles.
(214) A RBN.sub.R (not shown for clarity of illustration) can be arranged to generates a ‘bias voltage bus’ which are plurality of reference voltage signals that for example can be shared amongst a plurality of reference network of a plurality of tiny Weight, Activation, Multiplication Reference, and Bias-Offset DACs.
(215) Notice in the embodiment illustrated in
(216) A code S.sub.1, representing an activation signal (which is also a scalar signal), programs d/a: S.sub.1 to generate a scalar current signal that is inputted to an emitter port of diode connected parasitic substrate transistor Q.sub.Y, and generating a voltage signal V.sub.S1 which is shared between multipliers X.sub.1,1 and X.sub.1,2.
(217) A digital word C.sub.1, representing a Multiplication Reference Signal, programs d/a: C.sub.1 to generate a current signal that is inputted to an emitter port of a diode connected parasitic substrate transistor Q.sub.R, and generating a voltage signal V.sub.R1, which is also shared between multipliers X.sub.1,1 and X.sub.1,2.
(218) A code W.sub.1,1 representing a weight signal programs d/a: W.sub.1,1, and in accordance with that is inputted to an emitter port of a diode connected parasitic substrate transistor Q.sub.X, and generating a voltage signal V.sub.w1,1.
(219) A digital word W.sub.1,2, representing a weight signal, and in accordance with XNOR operations, programs d/a: W.sub.1,2 to generate a current signal in X.sub.1,2 that generates V.sub.w1,2.
(220) Here is a brief description of how scalar multipliers X.sub.1,1 and X.sub.1,2 operate:
(221) Let's denote Veb.sub.Q.sub.
(222) Setting aside non-idealities in scalar multiplier X.sub.1,1, the gain of a first amplifier's differential pairs N.sub.1-N.sub.2 through P.sub.1-P.sub.2-P.sub.3 and second amplifier's differential pairs N.sub.3-N.sub.4 (also) through P.sub.1-P.sub.2-P.sub.3 is substantially equal and G.
(223) For scalar multiplier X.sub.1,1, the gate voltage at node g.sub.1,1 regulates transistor P.sub.3's current onto diode connected transistor Q.sub.Z so that the following equation holds:
(224) G×{(v.sub.Q.sub.(v.sub.Q.sub.
v.sub.Q.sub.
(225)
For I.sub.Q.sub.S.sub.1.Math.W.sub.1,1≈S.sub.1×W.sub.1,1
(226) Note that at the expense of extra circuitry and current consumption, independent multiplication reference signals can be programmed to perform batch normalization for plurality of MACs (e.g., programming I.sub.Q.sub.
(227) Next, the current signal representing S.sub.1.Math.W.sub.1,1 is mirrored onto transistor P.sub.4.
(228) Bear in mind that voltage at node g.sub.1,1 can be shared in plurality of current mirrors (with a single PMOS) to copy S.sub.1.Math.W.sub.1,1 multiplication product signal onto other channels as required by an end-application.
(229) The W.sub.1,1MSB controls the polarity of S.sub.1×W.sub.1,1 signal that is directed through either P.sub.SW1+ or P.sub.SW1− transistors (current-switch) to generate multi-quadrant ±S.sub.1×W.sub.1 signals that are ‘multiplication product differential current signals’ (e.g., ±S.sub.1×W.sub.1,1) which are fed onto a current mirror comprising N.sub.a1 and N.sub.b1.
(230) Similar equations and descriptions provided above also apply to scalar multiplier X.sub.2,1S.sub.2×W.sub.2,1≈S.sub.2.Math.W.sub.2,1. Here also, W.sub.2,1MSB controls the polarity of S.sub.2×W.sub.2,1 signal that is directed through either P.sub.SW2+ or P.sub.SW2− current-switch to generate multi-quadrant ±S.sub.2×W.sub.2,1 signal which are then coupled/summed with ±S.sub.1×W.sub.1,1 (and fed onto the same current mirror comprising of N.sub.a1 and N.sub.b1) to generate ±Z.sub.1.
(231) As a result, a differential multi-quadrant ±Z.sub.1=±S.sub.1×W.sub.1,1±S.sub.2×W.sub.2,1 is applied across the current mirror comprising N.sub.a1 and N.sub.b1 whose outputs are summed with offset-bias current of d/a: B.sub.1. The output port of N.sub.a1 and N.sub.b1 current mirror is also couple with a diode-connected substrate vertical PNP transistor Q.sub.Y′1 transistor (of the next layers scalar multiplier) to facilitate ReLu operation, as part of an 1.sub.SUM,OFFSET,ReLu functional block illustrated in the simplified embodiment of
(232) If the ±Z.sub.1>B.sub.1, then the diode connected Q.sub.Y′1 transistor shuts off and its current signal trends towards zero. If the ±Z.sub.1<B.sub.1, then the diode connected Q.sub.Y′1 transistor's current signal trend towards ±Z.sub.1+B.sub.1, which is then passed onto a diode connected transistor (e.g., Q.sub.Y′1) of the next scalar multiplier in the next layer of neural network as an intermediate scalar/activation signal S′.sub.1=ReLu{±Z.sub.1+B.sub.1}.
(233) Similar equations and descriptions provided above also apply to the other scalar multipliers X.sub.1,2.Math.S.sub.1×W.sub.1,2≈S.sub.1.Math.W.sub.1,2, and X.sub.2,2.Math.S.sub.2×W.sub.2,2≈S.sub.2.Math.W.sub.2,2. Accordingly, 2.sub.SUM,OFFSET,ReLu (functional block illustrated in the simplified embodiment of
(234) Notice that for performing a multi-quadrant scalar MAC operation, a plurality of ±S.sub.1×W.sub.1 signals can share the same Add+ReLu functional block similar to what is depicted in the simplified embodiment of
(235) Please also note that intermediate activation signals (e.g., V.sub.S′1, V.sub.S′2, etc.) do not require being digitalized by an ADC and require no register/memory but instead an intermediate activation signals can be inputted as a scalar activation signal into a corresponding diode connected substrate vertical PNP transistors of a corresponding scalar multiplier in the next layer, which saves on silicon area, reduces power consumption and enhances signal processing time.
(236) Also, the disclosed method of programming weight signals (i.e., w) and/or activation signals (i.e., s) and/or their combination can be uniformly distributed by being trained to a specific dynamic range at key summation node (e.g., Gaussian distribution with an average and a sigma) would provide a programmable boundary around MAC signals (i.e., ΣS.sub.1.Math.W.sub.1 or ±ΣS.sub.1.Math.W.sub.1 or multi-quadrant ±S.sub.1×W.sub.1 signals) at corresponding summation nodes or through corresponding summation wires. MAC summation signals (at node or through wires) with programmable swings can provide the benefit of lower power consumption and enhanced dynamic performance.
(237) The simplified embodiment in
(238)
(239) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.
Section 8—Description of FIG. 8
(240)
(241) Also, free from intermediate Analog-to-Digital (ADC) and intermediate activation memory and intermediate activation Digital-to-Analog Converter (iDAC), the disclosed simplified embodiment of
(242) In the simplified neural network embodiment illustrated in
(243) The simplified neural network depiction of
(244) Keep in mind that depending on the end-application requirements, the simplified neural network embodiment of
(245) Furthermore, keep in mind that for CIM like operations, SRAMs (not shown for clarity of illustration) that for example store the weight training, bias-offset, reference multiplication codes can be placed on the silicon die right next to Weight (e.g., d/a: w.sub.1,1), Bias-Offset (e.g., da/a: B.sub.1), and Multiplication Reference (e.g., d/a: C.sub.1) DACs which can reduce dynamic power consumption associated with digital data communication cycles.
(246) Moreover, a RBN.sub.R (not shown for clarity of illustration) can be arranged, in the simplified embodiment illustrated in
(247) A code S.sub.1, representing an activation signal (which is also a scalar signal), programs d/a: S.sub.1 to generate a scalar current signal that is inputted onto a diode connected parasitic substrate transistor Q.sub.Y, and generating a voltage signal V.sub.S1 which is shared between scalar multipliers X.sub.1,1 and X.sub.1,2.
(248) A digital word C.sub.1, representing a Multiplication Reference Signal, programs d/a: C.sub.1 to generate a current signal that is inputted onto a diode connected parasitic substrate transistor Q.sub.R, and generating a voltage signal V.sub.R1, which is also shared between multipliers X.sub.1,1 and X.sub.1,2.
(249) A code W.sub.1,1 representing a weight signal programs d/a: W.sub.1,1 is inputted onto a diode connected parasitic substrate transistor Q.sub.1X generating a voltage signal V.sub.w1,1.
(250) A digital word W.sub.1,2, representing a weight signal programs d/a: W.sub.1,2 to generate a current signal in X.sub.1,2 that is fed onto a diode connected parasitic substrate transistor Q.sub.2X generating V.sub.w1,2.
(251) Similar to the description provided in the previous section, here is a brief description of how scalar multipliers X.sub.1,1 and X.sub.1,2 operate:
(252) Let's denote Veb.sub.Q.sub.
(253) Setting aside non-idealities in scalar multiplier X.sub.1,1, the gain through a first amplifier's differential transistor pairs N.sub.1-N.sub.2 through P.sub.1-P.sub.2-P.sub.3 transistor and second amplifier's differential transistor pairs N.sub.3-N.sub.4 also through P.sub.1-P.sub.2-P.sub.3 transistor is substantially equal and G.
(254) For scalar multiplier X.sub.1,1, the gate voltage at node g.sub.1,1 regulates transistor P.sub.3's current onto diode connected transistor Q.sub.Z so that the following equation holds:
(255) G×{(v.sub.Q.sub.(v.sub.Q.sub.
v.sub.Q.sub.
(256)
for I.sub.Q.sub.S.sub.1.Math.W.sub.1,1≈S.sub.1×W.sub.1,1.
(257) Note that at the expense of extra circuitry and current consumption, independent multiplication reference signals can be programmed to perform batch normalization for plurality of MACs (e.g., programming I.sub.Q.sub.
(258) Next, the current signal representing S.sub.1.Math.W.sub.1,1 is mirrored onto transistor P.sub.4.
(259) Bear in mind that voltage at node g.sub.1,1 can be shared in plurality of current mirrors (with a single PMOS) to copy S.sub.1.Math.W.sub.1,1 multiplication product signal onto other scalar multiplier channels as required by an end-application.
(260) Similar equations and descriptions provided above also apply to the other scalar multipliers such as X.sub.2,1S.sub.2×W.sub.2,1≈S.sub.2.Math.W.sub.2,1 signal that is mirrored and summed with S.sub.1.Math.W.sub.1,1 to generate Z.sub.1.
(261) As a result, a summation signal Z.sub.1=S.sub.1×W.sub.1,1+S.sub.2×W.sub.2,1 is summed with offset-bias current of d/a: B.sub.1, that is also couple with a diode-connected substrate vertical PNP transistor Q.sub.Y′1 transistor to facilitate ReLu operation. Thus, a signal S′.sub.1=ReLu{±Z.sub.1+B.sub.1} is generated and here is how:
(262) When Z.sub.1<B.sub.1 or when I.sub.S′1<0 then the diode connected Q.sub.Y′ transistor shuts off and its current signal trends towards zero. If the Z.sub.1>B.sub.1 or when I.sub.S′1>0 then the diode connected Q.sub.Y′ transistor's current signal trend towards Z.sub.1−B.sub.1 signal, which feeds a diode connected transistor (e.g., Q.sub.Y′) in the next layer of neural network, representing the next intermediate activation/scalar signal S′.sub.1=ReLu{Z.sub.1−B.sub.1}.
(263) Similar equations and descriptions provided above also apply to the other scalar multipliers X.sub.1,2S.sub.1×W.sub.1,2≈S.sub.1.Math.W.sub.1,2, and X.sub.2,2
S.sub.2×W.sub.2,2≈S.sub.2.Math.W.sub.2,2 Accordingly, S′.sub.2=ReLu{Z.sub.2−B.sub.2} signal is generated wherein Z.sub.2=S.sub.1×W.sub.1,2+S.sub.2×W.sub.2,2.
(264) Please also note that intermediate activation signals (e.g., V.sub.S′1, V.sub.S′2, etc.) do not require being digitalized by an ADC and require no register/memory but instead intermediate activation signals can be inputted as a scalar activation signal into a corresponding diode connected substrate vertical PNP transistors of a corresponding scalar multiplier of the next layer, which saves on silicon area, reduces power consumption and enhances signal processing time.
(265) Please refer to this disclosure's introduction section titled DETAILED DESCRIPTION which outlines some of the benefits relevant to the embodiment disclosed in the section above.