Patent classifications
G06F7/52
Reconfigurable input precision in-memory computing
Technology for reconfigurable input precision in-memory computing is disclosed herein. Reconfigurable input precision allows the bit resolution of input data to be changed to meet the requirements of in-memory computing operations. Voltage sources (that may include DACs) provide voltages that represent input data to memory cell nodes. The resolution of the voltage sources may be reconfigured to change the precision of the input data. In one parallel mode, the number of DACs in a DAC node is used to configure the resolution. In one serial mode, the number of cycles over which a DAC provides voltages is used to configure the resolution. The memory system may include relatively low resolution voltage sources, which avoids the need to have complex high resolution voltage sources (e.g., high resolution DACs). Lower resolution voltage sources can take up less area and/or use less power than higher resolution voltage sources.
TRUNCATED ARRAY FOR PERFORMING DIVISION
A computer-implemented method for deriving a hardware representation of a fixed logic circuit for performing division of an input x by a divisor selectable from a plurality of divisors, where x is an m-bit integer, includes normalising each of the plurality of divisors to form a plurality of multipliers; forming a summation array arranged to multiply the input x by any one of the plurality of multipliers; truncating the summation array by discarding all columns less significant than the k.sup.th column of the summation array below the position of a binary point, where k=[log.sub.2m]; determining a corrective constant in dependence on the maximum sum of the partial products discarded from the summation array for at least one of the multipliers; and generating a hardware representation of a fixed logic circuit implementing the truncated summation array including the corrective constant.
TRUNCATED ARRAY FOR PERFORMING DIVISION
A computer-implemented method for deriving a hardware representation of a fixed logic circuit for performing division of an input x by a divisor selectable from a plurality of divisors, where x is an m-bit integer, includes normalising each of the plurality of divisors to form a plurality of multipliers; forming a summation array arranged to multiply the input x by any one of the plurality of multipliers; truncating the summation array by discarding all columns less significant than the k.sup.th column of the summation array below the position of a binary point, where k=[log.sub.2m]; determining a corrective constant in dependence on the maximum sum of the partial products discarded from the summation array for at least one of the multipliers; and generating a hardware representation of a fixed logic circuit implementing the truncated summation array including the corrective constant.
MULTIPLICATION BY A RATIONAL IN HARDWARE WITH SELECTABLE ROUNDING MODE
A fixed logic circuit for performing multiplication of an input x by a constant rational p/q so as to calculate an output y according to a directed rounding or round-to-nearest rounding mode. Fixed logic hardware is derived comprising an addition array configured to operate on canonical signed digit (CSD) forms of binary values (a CSD array) so as to form an approximation of a multiplication of an input x [m−1:0] by a rational p/q. A truncated summation array of a finite sequence of most significant bits of an infinite CSD expansion of the rational p/q operating on the bits of the input x satisfies
Registers define a plurality of corrective constants for a respective plurality of rounding modes, and selection logic selects the respective corrective constant for that rounding mode in dependence on a rounding mode in which the truncated summation array is to operate.
Execution unit
An execution unit comprising a processing pipeline configured to perform calculations to evaluate a plurality of mathematical functions. The processing pipeline comprises a plurality of stages through which each calculation for evaluating a mathematical function progresses to an end result. Each of a plurality of processing circuits in the pipeline is configured to perform an operation on input values during at least one stage of the plurality of stages. The plurality of processing circuits include multiplier circuits. A first multiplier circuit and a second multiplier circuit are configured to operate in parallel, such that at the same stage in the processing pipeline, the first multiplier circuit and the second multiplier circuit perform their processing. A third multiplier circuit is arranged in series with the first multiplier circuit and the second multiplier circuit and processes outputs from the first multiplier circuit and the second multiplier circuit.
Auto weight scaling for RPUs
Techniques for auto weight scaling a bounded weight range of RPU devices with the size of the array during ANN training are provided. In one aspect, a method of ANN training includes: initializing weight values w.sub.init in the array to a random value, wherein the array represents a weight matrix W with m rows and n columns; calculating a scaling factor β based on a size of the weight matrix W; providing digital inputs x to the array; dividing the digital inputs x by a noise and bound management factor α to obtain adjusted digital inputs x′; performing a matrix-vector multiplication of the adjusted digital inputs x′ with the array to obtain digital outputs y′; multiplying the digital outputs y′ by the noise and bound management factor α; and multiplying the digital outputs y′ by the scaling factor β to provide digital outputs y of the array.
Auto weight scaling for RPUs
Techniques for auto weight scaling a bounded weight range of RPU devices with the size of the array during ANN training are provided. In one aspect, a method of ANN training includes: initializing weight values w.sub.init in the array to a random value, wherein the array represents a weight matrix W with m rows and n columns; calculating a scaling factor β based on a size of the weight matrix W; providing digital inputs x to the array; dividing the digital inputs x by a noise and bound management factor α to obtain adjusted digital inputs x′; performing a matrix-vector multiplication of the adjusted digital inputs x′ with the array to obtain digital outputs y′; multiplying the digital outputs y′ by the noise and bound management factor α; and multiplying the digital outputs y′ by the scaling factor β to provide digital outputs y of the array.
Accelerated mathematical engine
Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.
Accelerated mathematical engine
Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.
Computation in-memory architecture for analog-to-digital conversion
A device includes a comparator to provide an indication of a difference between Vp on a first terminal coupled to a top plate of each of a first group of differential capacitors, and Vn on a second terminal coupled to a top plate of each of a second group of differential capacitors. The device includes a control circuit coupled to the comparator. The control circuit is to receive a first indication of a difference between Vp and Vn; responsive to the first indication, cause a first driver to provide a reference voltage to bottom plates of one of the first and second groups, and cause a second driver to provide a ground voltage to bottom plates of the other of the first and second groups; receive a second indication of a difference between Vp and Vn; and provide a digital value responsive to the first indication and the second indication.