G06F7/5443

METHOD AND DEVICE FOR BINARY CODING OF SIGNALS IN ORDER TO IMPLEMENT DIGITAL MAC OPERATIONS WITH DYNAMIC PRECISION

A computer-implemented method for coding a digital signal intended to be processed by a digital computing system includes the steps of: receiving a sample of the digital signal quantized on a number N.sub.d of bits, decomposing the sample into a plurality of binary words of parameterizable bit size N.sub.p, coding the sample through a plurality of pairs of values, each pair comprising one of the binary words and an address corresponding to the position of the binary word in the sample, transmitting the pairs of values to an integration unit in order to carry out a MAC operation between the sample and a weighting coefficient.

High-precision anchored-implicit processing

An apparatus includes a processing circuit and a storage device. The processing circuit is configured to perform one or more processing operations in response to one or more instructions to generate an anchored-data element. The storage device is configured to store the anchored-data element. A format of the anchored-data element includes an identification item, an overlap item, and a data item. The data item is configured to hold a data value of the anchored-data element. The identification item indicates an anchor value for the data value or one or more special values.

Integrated circuit chip apparatus

Provided are an integrated circuit chip apparatus and a related product, the integrated circuit chip apparatus being used for executing a multiplication operation, a convolution operation or a training operation of a neural network. The present technical solution has the advantages of a small amount of calculation and low power consumption.

Multiplier and Adder in Systolic Array
20230015148 · 2023-01-19 ·

The subject matter described herein provides systems and techniques for the design and use of multiply-and-accumulate (MAC) units to perform matrix multiplication by systolic arrays, such as those used in accelerators for deep neural networks (DNNs). These MAC units may take advantage of the particular way in which matrix multiplication is performed within a systolic array. For example, when a matrix A is multiplied with a matrix B, the scalar value, a, of the matrix A is reused many times, the scalar value, b, of the matrix B may be streamed into the systolic array and forwarded to a series of MAC units in the systolic array, and only the final values and not the intermediate values of the dot products, computed for the matrix multiplication, may be correct. MAC unit hardware that is particularized to take advantage of these observations is described herein.

ADAPTIVE MAC ARRAY SCHEDULING IN A CONVOLUTIONAL NEURAL NETWORK

The present invention relates to convolution neural networks (CNN) and methods for improving computational efficiency of multiply accumulate (MAC) array structure Specifically, the invention relates to cutting of activation data into a number of tiles for increasing overall computation efficiency. The invention discloses techniques to cut an activation data into a plurality of tiles by using a 3-D convolution computation core and support bigger tensor sizes. Lastly, the invention provides adaptive scheduling of MAC array to achieve high utilization in multi-precision neural network acceleration.

CALCULATING DEVICE
20230221962 · 2023-07-13 · ·

According to one embodiment, a calculating device includes a first memory, a second memory, a third memory, a first arithmetic module, a second arithmetic module, a first conductive line electrically connecting a first output terminal of the first memory and a first input terminal of the first arithmetic module, a second conductive line electrically connecting a second output terminal of the first memory and a first input terminal of the second arithmetic module, a third conductive line electrically connecting a first output terminal of the second memory and a second input terminal of the second arithmetic module, a fourth conductive line electrically connecting a first output terminal of the third memory and a third input terminal of the second arithmetic module, and a fifth conductive line electrically connecting a first output terminal of the second arithmetic module and a second input terminal of the first arithmetic module.

Simulation of quantum circuits
11556686 · 2023-01-17 · ·

Methods, systems and apparatus for simulating quantum circuits including multiple quantum logic gates. In one aspect, a method includes the actions of representing the multiple quantum logic gates as functions of one or more classical Boolean variables that define a undirected graphical model with each classical Boolean variable representing a vertex in the model and each function of respective classical Boolean variables representing a clique between vertices corresponding to the respective classical Boolean variables; representing the probability of obtaining a particular output bit string from the quantum circuit as a first sum of products of the functions; and calculating the probability of obtaining the particular output bit string from the quantum circuit by directly evaluating the sum of products of the functions. The calculated partition function is used to (i) calibrate, (ii) validate, or (iii) benchmark quantum computing hardware implementing a quantum circuit.

Auto weight scaling for RPUs

Techniques for auto weight scaling a bounded weight range of RPU devices with the size of the array during ANN training are provided. In one aspect, a method of ANN training includes: initializing weight values w.sub.init in the array to a random value, wherein the array represents a weight matrix W with m rows and n columns; calculating a scaling factor β based on a size of the weight matrix W; providing digital inputs x to the array; dividing the digital inputs x by a noise and bound management factor α to obtain adjusted digital inputs x′; performing a matrix-vector multiplication of the adjusted digital inputs x′ with the array to obtain digital outputs y′; multiplying the digital outputs y′ by the noise and bound management factor α; and multiplying the digital outputs y′ by the scaling factor β to provide digital outputs y of the array.

SYSTEM AND METHOD APPLIED WITH COMPUTING-IN-MEMORY

A system is provided. The system includes a multiply-and-accumulate circuit and a local generator. The multiply-and-accumulate circuit is coupled to a memory array and generates a multiply-and-accumulate signal indicating a computational output of the memory array. The local generator is coupled to the memory array and generates at least one reference signal at a node in response to one of a plurality of global signals that are generated according to a number of the computational output. The local generator is further configured to generate an output signal according to the signal and a summation of the at least one reference signal at the node.

MEMORY DEVICE AND OPERATING METHOD THEREOF

A memory device, includes a memory array for storing a plurality of vector data each of which has an MSB vector and a LSB vector. The memory array includes a plurality of memory units each of which has a first bit and a second bit. The first bit is used to store the MSB vector of each vector data, the second bit is used to store the LSB vector of each vector data. Each vector data is executed with a multiplying-operation, the MSB vector and the LSB vector of each vector data is executed with a first group-counting operation and a second group-counting operation respectively. The threshold voltage distribution of each memory unit is divided into N states, where N is a positive integer and N is less than 2 to the power of 2, the effective bit number stored by each memory unit is less than 2.