Patent classifications
G06F7/5318
DIGITAL NEURAL NETWORK
Various embodiments provide for digital neural network (DNN) that form part of a machine learning (ML) processor and that perform a compute intensive function, such as convolutions or matrix multiplies, which can facilitate operation of a ML model. According to some embodiments, the DNN comprises a combinatorial tree that uses multiple-accumulate (MAC) units, and a sequencer that reads values from memory devices and feeds the combinatorial tree.
ELECTRO-PHOTONIC NETWORK FOR MACHINE LEARNING
Various embodiments provide for electro-photonic networks, including a plurality of processing elements connected by bidirectional photonic channels, suited for implementing neural-network models. Weights of the model may be preloaded into memory of the processing elements based on assignments of neural nodes to processing elements implementing them, and routers of the processing elements can be configured to stream activations between the processing elements based on a predetermined flow of activations in the model.
Full adder cell with improved power efficiency
An adder circuit that includes an operand input and a second operand input to an XNOR cell. The XNOR cell is configured to provide the operand input and the second operand input to both a NAND gate and a first OAI cell. A second OAI cell transforms the output of the XNOR cell into a carry out signal.
SYSTEMS AND METHODS FOR DATA PLACEMENT FOR IN-MEMORY-COMPUTE
According to one embodiment, a memory module includes: a memory die including a dynamic random access memory (DRAM) banks, each including: an array of DRAM cells arranged in pages; a row buffer to store values of one of the pages; an input/output (IO) module; and an in-memory compute (IMC) module including: an arithmetic logic unit (ALU) to receive operands from the row buffer or the IO module and to compute an output based on the operands and one of a plurality of ALU operations; and a result register to store the output of the ALU; and a controller to: receive, from a host processor, operands and an instruction; determine, based on the instruction, a data layout; supply the operands to the DRAM banks in accordance with the data layout; and control an IMC module to perform one of the ALU operations on the operands in accordance with the instruction.
NEURAL NETWORK DEVICE, NEURAL NETWORK SYSTEM, AND OPERATION METHOD EXECUTED BY NEURAL NETWORK DEVICE
According to an embodiment, a neural network device includes a circuit configured to receive a first bit sequence representing a first value and output a second bit sequence representing a threefold value of the first value. The device includes a circuit configured to generate a fourth bit sequence based on the first and second bit sequences and two adjacent bits of a third bit sequence representing a second value, and output a fifth bit sequence representing a product of the first and second values based on the fourth bit sequence, and to generate a seventh bit sequence based on the first and second bit sequences and two adjacent bits of a sixth bit sequence representing a third value, and output an eighth bit sequence representing a product of the first and third values based on the seventh bit sequence.
COUNTER-BASED MULTIPLICATION USING PROCESSING IN MEMORY
The present disclosure is directed to systems and methods for a memory device such as, for example, a Processing-In-Memory Device that is configured to perform multiplication operations in memory using a popcount operation. A multiplication operation may include a summation of multipliers being multiplied with corresponding multiplicands. The inputs may be arranged in particular configurations within a memory array. Sense amplifiers may be used to perform the popcount by counting active bits along bit lines. One or more registers may accumulate results for performing the multiplication operations.
Semiconductor device including an adder
According to the embodiments, a semiconductor device includes: an adder configured to generate positive multiple data of the multiplicand which is used for a plurality of the multiplication in plurality and does not include a value of 2.sup.n (n is a positive integer) of the multiplicand; a Wallace tree circuit provided in each of the multiplier circuits and configured to operate a sum of a plurality of partial products by using a plurality of adders; and a selection circuit provided in each of the multiplier circuits and configured to select, according to a plurality of bits selected from the multiplier, data falling in a multiple of one of the multiplicand, data of 2.sup.n of the multiplicand, and the positive multiple data from the adder in order to output as one partial product of the plurality of partial products to the Wallace tree circuit.
Apparatus and method of fast floating-point adder tree for neural networks
A computing device to implement fast floating-point adder tree for the neural network applications is disclosed. The fast float-point adder tree comprises a data preparation module, a fast fixed-point Carry-Save Adder (CSA) tree, and a normalization module. The floating-point input data comprises a sign bit, exponent part and fraction part. The data preparation module aligns the fraction part of the input data and prepares the input data for subsequent processing. The fast adder uses a signed fixed-point CSA tree to quickly add a large number of fixed-point data into 2 output values and then uses a normal adder to add the 2 output values into one output value. The fast adder uses for a large number of operands is based on multiple levels of fast adders for a small number of operands. The output from the signed fixed-point Carry-Save Adder tree is converted to a selected floating-point format.
SUPPORT FOR DIFFERENT MATRIX MULTIPLICATIONS BY SELECTING ADDER TREE INTERMEDIATE RESULTS
A first group of elements is element-wise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix. A control unit is used to instruct the matrix multiplication hardware unit to perform a plurality of different matrix multiplications in parallel by using a combined matrix that includes elements of a plurality of different operand matrices and utilize one or more selected ones of the intermediate results of the hierarchical tree of adders for use in determining the output result matrix that includes different groups of elements representing different multiplication results corresponding to different ones of the different operand matrices.
Programmable multiply-add array hardware
An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.