G06F2207/4828

KERNEL TRANSFORMATION TECHNIQUES TO REDUCE POWER CONSUMPTION OF BINARY INPUT, BINARY WEIGHT IN-MEMORY CONVOLUTIONAL NEURAL NETWORK INFERENCE ENGINE

Techniques are presented for performing in-memory matrix multiplication operations for binary input, binary weight valued convolution neural network (CNN) inferencing. The weights of a filter are stored in pairs of memory cells of a storage class memory device, such as a ReRAM or phase change memory based devices. To reduce current consumption, the binary valued filters are transformed into ternary valued filters by taking sums and differences of binary valued filter pairs. The zero valued weights of the transformed filters are stored as a pair of high resistance state memory cells, reducing current consumption during convolution. The results of the in-memory multiplications are pair-wise combined to compensate for the filter transformations. To compensate for zero valued weights, a zero weight register stores the number of zero weights along each bit line and is used to initialize counter values for accumulating the multiplication operations.

Memory processing unit

An in-memory computing system for computing vector-matrix multiplications includes an array of resistive memory devices arranged in columns and rows, such that resistive memory devices in each row of the array are interconnected by a respective wordline and resistive memory devices in each column of the array are interconnected by a respective bitline. The in-memory computing system also includes an interface circuit electrically coupled to each bitline of the array of resistive memory devices and computes the vector-matrix multiplication between an input vector applied to a given set of wordlines and data values stored in the array. For each bitline, the interface circuit receives an output in response to the input being applied to the given wordline, compares the output to a threshold, and increments a count maintained for each bitline when the output exceeds the threshold. The count for a given bitline represents a dot-product.

TERNARY IN-MEMORY ACCELERATOR

A ternary processing cell used as a memory cell and capable of in-memory arithmetic is disclosed which includes a first memory cell, adapted to hold a first digital value, a second memory cell, adapted to hold a second digital value, wherein a binary combination of the first digital value and the second digital value establishes a first ternary operand, a ternary input establishing a second ternary operand, and a ternary output, wherein the ternary output represents a multiplication of the first ternary operand and the second ternary operand.

Update management for RPU array

A computer-implemented method and computer processing system are provided for update management for a neural network. The method includes performing an isotropic update process on the neural network using a Resistive Processing Unit. The isotropic update process uses a multiplicand and a multiplier from a multiplication operation. The performing step includes scaling the multiplicand and the multiplier to have a same order of magnitude.

MEMORY PROCESSING UNIT
20210210138 · 2021-07-08 ·

An in-memory computing system for computing vector-matrix multiplications includes an array of resistive memory devices arranged in columns and rows, such that resistive memory devices in each row of the array are interconnected by a respective word line and resistive memory devices in each column of the array are interconnected by a respective bitline. The in-memory computing system also includes an interface circuit electrically coupled to each bitline of the array of resistive memory devices and computes the vector-matrix multiplication between an input vector applied to a given set of word lines and data values stored in the array. For each bitline, the interface circuit receives an output in response to the input being applied to the given wordline, compares the output to a threshold, and increments a count maintained for each bitline when the output exceeds the threshold. The count for a given bitline represents a dot-product.

PRODUCT-SUM CALCULATION DEVICE AND PRODUCT-SUM CALCULATION METHOD
20210011687 · 2021-01-14 ·

To provide a product-sum calculation device and a product-sum calculation method capable of more efficient operation. A product-sum calculation device includes: a plurality of synapses including a transistor and having a variable resistance value; a plurality of input lines extending in a first direction and configured to propagate an input signal to each of the plurality of synapses; a plurality of output lines extending in a second direction orthogonal to the first direction, and configured to output a product-sum calculation result of the input signal from each of the plurality of synapses; and a charge and discharge control unit configured to control an output state of the product-sum calculation result by controlling a charge and discharge state of the output line on the basis of a polarity of the transistor.

Auto Weight Scaling for RPUs
20200380349 · 2020-12-03 ·

Techniques for auto weight scaling a bounded weight range of RPU devices with the size of the array during ANN training are provided. In one aspect, a method of ANN training includes: initializing weight values w.sub.init in the array to a random value, wherein the array represents a weight matrix W with m rows and n columns; calculating a scaling factor based on a size of the weight matrix W; providing digital inputs x to the array; dividing the digital inputs x by a noise and bound management factor to obtain adjusted digital inputs x; performing a matrix-vector multiplication of the adjusted digital inputs x with the array to obtain digital outputs y; multiplying the digital outputs y by the noise and bound management factor ; and multiplying the digital outputs y by the scaling factor to provide digital outputs y of the array.

FAULT-TOLERANT ANALOG COMPUTING
20200382135 · 2020-12-03 ·

A fault-tolerant analog computing device includes a crossbar array having a number l rows and a number n columns intersecting the l rows to form ln memory locations. The l rows of the crossbar array receive an input signal as a vector of length l. The n columns output an output signal as a vector of length n that is a dot product of the input signal and the matrix values defined in the ln memory locations. Each memory location is programmed with a matrix value. A first set of k columns of the n columns is programmed with continuous analog target matrix values with which the input signal is to be multiplied, where k<n. A second set of m columns of the n columns is programmed with continuous analog matrix values for detecting an error in the output signal that exceeds a threshold error value, where m<n.

ARITHMETIC APPARATUS

An arithmetic apparatus according to an embodiment outputs a multiplicative value obtained by multiplying a weight value and an input value. The arithmetic apparatus includes a memristor, a logarithmic transform circuit, and a current-voltage converter circuit. The memristor is a device capable of changing voltage-current characteristic, and the memristor is preset to voltage-current characteristic according to the weight value. The logarithmic transform circuit applies an intermediate voltage, to the memristor, that is obtained by logarithmically transforming an input voltage according to the input value in accordance with a logarithmic transform function obtained by multiplying a natural logarithm function by a preset coefficient. The current-voltage converter circuit outputs an output voltage obtained by performing current-voltage conversion of current flowing through the memristor according to a preset linear function, as a multiplicative value.

Bit-Ordered Binary-Weighted Multiplier-Accumulator
20200356620 · 2020-11-12 · ·

Various arrangements for performing vector-matrix multiplication are provided here. Digital input vectors that include binary-encoded values can be converted into a plurality of analog signals using a plurality of one-bit digital to analog converters (DACs). Using an analog vector matrix multiplier, a vector-matrix multiplication operation can be performed using a weighting matrix for each bit-order of the plurality of analog signals. For each performed vector-matrix multiplication operation, a bit-ordered indication of an output of the analog vector matrix multiplier may be stored. A bit-order weighted summation of the sequentially performed vector-matrix multiplication operation may be performed.