General purpose neural processor
11640524 · 2023-05-02
Assignee
Inventors
Cpc classification
International classification
G11C13/00
PHYSICS
Abstract
A computer processor includes an on-chip network and a plurality of tiles. Each tile includes an input circuit to receive a voltage signal from the network, and a crossbar array, including at least one neuron. The neuron includes first and second bit lines, a programmable resistor connecting the voltage signal to the first bit line, and a comparator to receive inputs from the two bit lines and to output a voltage, when a bypass condition is not active. Each tile includes a programming circuit to set a resistance value of the resistor, a pass-through circuit to provide the voltage signal to an input circuit of a first additional tile, when a pass-through condition is active, a bypass circuit to provide values of the bit lines to a second additional tile, when the bypass condition is active; and at least one output circuit to provide an output signal to the network.
Claims
1. A comparator configured for use in a computer processor, the comparator comprising: an input stage, comprising: a first input line; a second input line; a first input calibration line comprising a transistor and a programmable resistor and configured to add current to the first input line; a second input calibration line comprising a transistor and a programmable resistor and configured to add current to the second input line; a first output calibration line comprising a transistor and a programmable resistor and configured to drain current from the first input line; a second output calibration line comprising a transistor and a programmable resistor and configured to drain current from the second input line; a first plurality of parallel diodes configured to receive current from the first input line, wherein a voltage operating range across the first plurality of parallel diodes is configurable by selecting a subset of the first plurality of diodes to be activated; and a second plurality of diodes configured to receive current from the second input line, wherein a voltage operating range across the second plurality of parallel diodes is configurable by selecting a subset of the second plurality of diodes to be activated; an amplifier stage, electrically coupled to the input stage; and an output stage, electrically coupled to the amplifier stage.
2. The comparator as recited in claim 1, the amplifier stage comprising: a first amplifier stage, comprising: a first differential amplifier having a first voltage output and a second voltage output; and a first transmission gate electrically connecting the first voltage output to the second voltage output; and a second amplifier stage, comprising: a second differential amplifier; and a second transmission gate electrically connecting the second differential amplifier to an output stage; wherein the amplifier stage is configured such that whenever the first transmission gate is open, the second transmission gate is closed.
3. A method of calibrating the comparator of claim 1, wherein the comparator includes a plurality of input lines including the first input line and the second input line, and a plurality of calibration lines including the first input calibration line, the second input calibration line, the first output calibration line, and the second output calibration line, the method comprising: (a) repeatedly performing until a minimum number of +1 input values and −1 input values are set to on the processes of: setting weights associated with the input lines such that there are an equal number of +1 input values and −1 input values; setting each input line to be on; simultaneously cycling one +1 weight and one −1 weight by turning the +1 input off and then back on, while simultaneously turning the −1 input on and then back off, and recording the resulting output values; and reducing the number of input lines that are set to on by deactivating one input line having a +1 value and deactivating one input line having a −1 value; (b) adjusting programmable resistor values of the calibration lines based on the recorded output values; (c) repeatedly performing processes (a) and (b) until the output values indicate that acceptable operation has been reached.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION
(14) Definitions: The term “memristor” refers to an electrical component whose resistance can be increased or decreased in a controllable fashion by modifying voltage/current inputs.
(15) Tiled Arrays
(16) The variety of applications known in the art of neuromorphic computing call for a variety of crossbar array sizes for efficient mapping of neural nets onto the hardware. Finding a single array size that can fit this variety, desirable for a general purpose neural processor, is problematic. Tiled arrays may be used to overcome this limitation, as discussed in detail below. A tile can be a complete array (when small arrays are efficient), and they can also be combined to create much larger arrays, which are necessary for certain neural networks. This novel contribution can be used as a key component of a general purpose neural processor.
(17) Comparator Design
(18) The tiled array architecture is made effective by the use of a 1/High Z neuron architecture, and the design of a compact, power efficient, fast comparator. Many memristor-based neural nets use a very fast, but large and power hungry analog-to-digital converter (ADC) for neuron evaluation. This would be a problematic approach for the tiled array concept. A comparator design is described herein that can efficiently support small tiles.
(19) Architecture and Key Components
(20) Because of their programmable conductance capability, memristors can function as dynamic weights in neuromorphic computing designs. Referencing
(21)
(22) Memristor arrays implementing a threshold gate network (TGN) can be organized in multiple ways. Three approaches in particular are discussed below.
(23) 1. Single-Ended Voltage
(24)
(25) 2. Differential Voltage
(26)
(27) A differential voltage architecture should also have improved common mode noise rejection. This indicates DV may provide benefits over SV in certain embodiments.
(28) 3. Differential Current with 1/High Z Inputs
(29)
(30) Circuit analysis indicates the 1/High Z differential current architecture (DZ) is power efficient compared to the SV and DV architectures. It also has a desirable property for circuit analysis: the current for each input is directly proportional to the weighted input for the neuron. This enables certain mathematical properties of the TGN to be verified as correctly implemented via simple analysis or simulation of the circuits.
(31) Comparator Design
(32) The comparator is an important element of the architecture. The speed of this circuit is one of the main factors in estimating the neural network throughput (the router network is another important factor). Since the comparator sinks the currents from the array, it has to be large enough to handle the total current while still being able to discriminate a minimum difference (ΔGmin=1 μS). This can have a major impact on the overall area and timing, and can limit the number of inputs allowed into a single neuron. It is also a significant consumer of the overall power. A comparator architecture 901 in accordance with an embodiment of the present invention is shown in
(33) Comparator Architecture
(34)
(35) For traditional applications this design would be impractical for at least two reasons:
(36) 1. The maximum current input and minimum current difference that can be sensed are inversely related, limiting the operating range of the design; and
(37) 2. The design is very sensitive to device mismatches (such as small V.sub.t differences).
(38) In using the comparator for neural net applications, however, we can take advantage of some conditions that are not typically available.
(39) 1. The weights in the neural network must be programmed, and are therefore known in advance. For any given set of weights, the comparator needs to operate correctly in only a subset of the total range required.
(40) 2. The memristors are programmable conductance devices that can be used to ensure correct operation even under device mismatch conditions.
(41) We take advantage of this knowledge by modifying the base comparator design (see
(42) The series memristors have greater effect when V.sup.+ (or V.sup.−) is low; the parallel memristors have greater effect for higher V.sup.+ (or V.sup.−). The memristor values would most likely be found as part of a chip calibration procedure. This procedure would be done before setting the desired programming weights into the array, and uses a majority function for this purpose:
(43) 1. Set the weights to create equal numbers of +1 and −1 values, and set all inputs high.
(44) 2. During each major time interval, cycle one +1 weight, and then one −1 weight by turning the +1 input off, then on, simultaneously turning the −1 input on, then off (each for one clock cycle of 5 ns, giving a total time interval of 10 ns).
(45) 3. After this, reduce both the total positive and negative weights by 1 (or any other equal decrement).
(46) 4. Repeat until the “common mode” weight (i.e., the base number of negative and positive weights) is a minimum.
(47) 5. Based on the outputs, adjust the G.sub.p and G.sub.s devices as follows:
(48) High common mode weights that create “0” errors require an increase in G.sub.p.sup.+, while low common mode weights that create “1” errors require an increase in G.sub.s.sup.+. G.sub.p.sup.− and G.sub.s.sup.− would be adjusted if the opposite conditions exist.
(49) Repeat the procedure until acceptable operation is reached.
(50) This procedure assigns a value of 1 to each correct output, −1 to each incorrect output, and adjusts the comparator bias memristors until the total value equals the number of outputs measured (fully correct functionality). This procedure can be modified in many ways. For example, heavier emphasis can be given to getting correct values for high total conductance values and ignoring incorrect values at very low conductance values (or specific biases can be used to ensure extremely low conductance levels are never seen). Other optimizing algorithms can be used as desired. Using 45 nm design technology, simulations have shown that up to ±10 mV (20 mV total) V.sub.t mismatch and up to ±5 nm (10 nm total) dimensional mismatch can be tolerated between the critical pairs of devices in this differential design (N1A and N1B in the input stage in
(51) Another design modification used to improve the comparator performance is illustrated in
(52) 1. How fast can the bias current change V.sub.amp1?
(53) 2. How much does the bias current need to change V.sub.amp1?
(54) This can be expressed using the fact that V.sub.amp1 has a node capacitance, and therefore
I=C*ΔV/Δt
or
Δt=C*ΔV/I
(55) To reduce Δt, you can increase I or reduce ΔV. The first factor is essentially a design optimization: higher bias currents can swing the output faster, but take more power and create a larger design (increased capacitance) that will slow down the amplifier. Larger amplifier input transistors will also slow down the rate at which the diode can swing the input voltages, but that is a smaller, secondary influence. The second factor is input dependent. In a situation where the previous weighted sum is highly negative and the current weighted sum=+1 (or where the previous sum is highly positive and the current sum=−1), the ΔV.sub.amp1 value is very high, and the final V.sub.amp1 voltage will be very close to V.sub.gate1. The bias current will take a relatively long time Δt to switch V.sub.amp1 past V.sub.gate1. Only then will the second amplifier switch as well. The second factor however, can be managed by the architecture.
(56) By adding a transmission gate connecting V.sub.amp1 to V.sub.gate1 1304, we can controllably force V.sub.amp1 to be very close to V.sub.gate1. ΔV.sub.amp1 will be very small (and relatively constant under all input conditions). This is done by using a strobe signal that turns this T-gate on during the early part of the comparison operation, and turning it off during the later part. The diodes are always on. While this T-gate is on, a second T-gate 1305 (connected to V.sub.amp2) is turned off, and the T-gate in the output driver (not shown) is turned on, which keeps the previous output valid and avoids Vout glitching. The T-gates are turned off (or on, respectively) during the later portion of the comparison operation. The simulations in
(57)
(58) The use of the 1/High Z differential current architecture (DZ), and the comparator design described above, enables an important architectural option, which we call a tile. One of the major difficulties in trying to design a general purpose neural processor is that the desired array sizes span a wide range. Just to use a few exemplary applications in the realm of neuromorphic computing (these applications are described, e.g., in D. J. Mountain, “A general purpose neural processor,” dissertation, University of Maryland, Baltimore County, 2017), the MNIST application maps well to 256×64 arrays, the CSlite decoder stage naturally fits into an 8×256 array, the CSlite detector stage has one layer in the network that requires 512×32 arrays (less than 512 inputs could not be mapped), and the AES-256 State Machine would prefer a 16×16 array mapping. Finding a single array size that can efficiently map all of these is a daunting task. The availability of tiles makes it more practical. The DZ architecture and our comparator allow for the use of control FETs to divert the differential current to specific diodes at the input stage of the comparator. This means that we can add one set of two additional control FETs per comparator 1901 (per neuron) that enable the current to be passed to a comparator in a different array. Keep in mind that the current being passed represents the weighted sum of the inputs. Therefore the function of the neural net is maintained. The second array is evaluating its inputs plus the inputs from the first array. The two arrays are then combined into a single neuron (or set of neurons). This concept is illustrated in
(59) The design optimization is now to find the optimum tile size, not the optimum array size. This new architectural option also greatly expands the set of possible solutions. Without this, no array smaller than 512 inputs could have been used for the general purpose neural processor design for the applications discussed above. Instead, tiles that are very small (8×2 or 16×1, for example) are possible solutions.
(60) The tile concept is further enhanced by the ability to control the current (and therefore the power) in the unused portions of the tile (unit cells, comparators). Simulations of a 256×32 array show that the active power can be completely eliminated. The leakage power is an extremely small fraction (much less than 1%) of the total. The input circuits may need to send the input value across a tile to a neighboring tile, through another driver/latch circuit. This adds a small delay (≈30 ps per tile). As long as the number of horizontal tiles connected is reasonable (10 or less), the effect on performance is small.
(61) Another point to be made is that the control PFETs added to the comparator design need to pass a large amount of current, and are therefore large (W/L=1200/45 nm). This keeps the ΔV.sub.ds below 3 mV. This adds about 1.9 μm.sup.2 in area to the comparator. A more important issue is that all the tiles need to have comparators, which are relatively large. This is because a tile needs to be an array itself, not just part of a larger array. The compact low power comparator design disclosed herein makes this practical.
(62) While the above description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the devices or algorithms illustrated can be made without departing from the spirit of the disclosure. As will be recognized, certain embodiments described herein can be embodied within a form that may not provide all of the features and benefits set forth herein, as some features can be used or practiced separately from others. The scope of the invention is indicated by the appended claims rather than the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.