Patent classifications
G06F7/499
PROCESSING-IN-MEMORY DEVICES HAVING MULTIPLE OPERATION CIRCUITS
A processing-in-memory (PIM) device may include a plurality of memory banks configured to provide plural groups of weight data, a global buffer configured to provide plural sets of vector data, and a plurality of multiplication/accumulation (MAC) operators configured to perform MAC operations of the plural groups of weigh data and the plural sets of vector data. Each of the plurality of MAC operators includes a plurality of multiple operation circuits. Each of the plurality of multiple operation circuits is configured to perform an arithmetic operation in a first operation mode, a second operation mode, or a third operation mode according to first to third selection signals.
PROCESSING-IN-MEMORY DEVICES HAVING MULTIPLE OPERATION CIRCUITS
A processing-in-memory (PIM) device may include a plurality of memory banks configured to provide plural groups of weight data, a global buffer configured to provide plural sets of vector data, and a plurality of multiplication/accumulation (MAC) operators configured to perform MAC operations of the plural groups of weigh data and the plural sets of vector data. Each of the plurality of MAC operators includes a plurality of multiple operation circuits. Each of the plurality of multiple operation circuits is configured to perform an arithmetic operation in a first operation mode, a second operation mode, or a third operation mode according to first to third selection signals.
METHOD AND APPARATUS FOR IMPLIED BIT HANDLING IN FLOATING POINT MULTIPLICATION
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
METHODS AND SYSTEMS OF OPERATING A NEURAL CIRCUIT IN A NON-VOLATILE MEMORY BASED NEURAL-ARRAY
In one aspect, a method of a neuron circuit includes the step of providing a plurality of 2.sup.N-1 single-level-cell (SLC) flash cells for each synapse (Y.sub.i) connected to a bit line forming a neuron. The method includes the step of providing an input vector (X.sub.i) for each synapse Y.sub.i wherein each input vector is translated into an equivalent electrical signal ES.sub.i (current I.sub.DACi, pulse T.sub.PULSEi, etc). The method includes the step of providing an input current to each synapse sub-circuit varying from 2.sup.0*ES.sub.i to (2.sup.N-1)*ES.sub.i. The method includes the step of providing a set of weight vectors or synapse (Y.sub.i), wherein each weight vector is translated into an equivalent threshold voltage level or resistance level to be stored in one of many non-volatile memory cells assigned to each synapse (Y.sub.i). The method includes the step of providing for 2.sup.N possible threshold voltage levels or resistance levels in the 2.sup.N-1 non-volatile memory cells of each synapse, wherein each cell is configured to store one of the two possible threshold voltage levels. The method includes the step of converting the N digital bits of the weight vector or synapse Y.sub.i into equivalent threshold voltage level and store the appropriate cell corresponding to that threshold voltage level in one of the many SLC cells assigned to the weight vector or synapse (Y.sub.i). The method includes the step of turning off all remaining 2.sup.N-1 flash cells of the respective synapse (Y.sub.i).
Various other methods are presented of forming neuron circuits by providing a plurality of single-level-cell (SLC) and many-level-cell (MLC) non-volatile memory cells, for each synapse (Y.sub.i) electrically connected to form a neuron. The disclosure shows methods of forming neurons in various configurations for non-volatile memory cells (flash, RRAM etc.); of different storage capabilities per cell—both SLC and MLC cells.
MODEL TRAINING METHOD AND APPARATUS FOR FEDERATED LEARNING, DEVICE, AND STORAGE MEDIUM
A model training method and apparatus for federated learning, a device and a storage medium are provided, which belong to the technical field of machining learning. The method includes: generating an i.sup.th scalar operator based on a (t-1).sup.th round of training data and a t.sup.th round of training data (201); transmitting an i.sup.th fusion operator to a next node device based on the i.sup.th scalar operator (202); determining an i.sup.th second-order gradient descent direction of an i.sup.th sub-model based on an acquired second-order gradient scalar, an i.sup.th model parameter and an i.sup.thfirst-order gradient; and updating the i.sup.th sub-model based on the i.sup.th second-order gradient descent direction to obtain a model parameter of the i.sup.th sub-model during a (t+1).sup.th round of iterative training.
Increased precision neural processing element
Neural processing elements are configured with a hardware AND gate configured to perform a logical AND operation between a sign extend signal and a most significant bit (“MSB”) of an operand. The state of the sign extend signal can be based upon a type of a layer of a deep neural network (“DNN”) that generate the operand. If the sign extend signal is logical FALSE, no sign extension is performed. If the sign extend signal is logical TRUE, a concatenator concatenates the output of the hardware AND gate and the operand, thereby extending the operand from an N-bit unsigned binary value to an N+1 bit signed binary value. The neural processing element can also include another hardware AND gate and another concatenator for processing another operand similarly. The outputs of the concatenators for both operands are provided to a hardware binary multiplier.
METHOD AND APPARATUS FOR GENERATING FIXED-POINT QUANTIZED NEURAL NETWORK
A method of generating a fixed-point quantized neural network includes analyzing a statistical distribution for each channel of floating-point parameter values of feature maps and a kernel for each channel from data of a pre-trained floating-point neural network, determining a fixed-point expression of each of the parameters for each channel statistically covering a distribution range of the floating-point parameter values based on the statistical distribution for each channel, determining fractional lengths of a bias and a weight for each channel among the parameters of the fixed-point expression for each channel based on a result of performing a convolution operation, and generating a fixed-point quantized neural network in which the bias and the weight for each channel have the determined fractional lengths.
COMPUTING DEVICE, RECOGNITION DEVICE, AND CONTROL DEVICE
The present invention aims to reduce power consumption in the operation based upon a recognition device (1000) including, in a neural network of multiple layers that output type of an object and existing coordinates based on external environment information, a selector (103) that selects input data to convolution operation units (107-1 to L) from external environment information; convolution operation units (107-1 to L) configured by a plurality of layers connected in cascade; and a parameter storage unit (109) that stores a weight parameter of each layer, a cumulative addition count of each layer, and an omitting bit number of each layer. The recognition device includes operation stop signal generation units (116-1 to L) that transmit one or more stop signals for stopping some or all of the computing units of the convolution operation units (107-1 to L) to the convolution operation units (107-1 to L) for each layer.
COMPUTING DEVICE, RECOGNITION DEVICE, AND CONTROL DEVICE
The present invention aims to reduce power consumption in the operation based upon a recognition device (1000) including, in a neural network of multiple layers that output type of an object and existing coordinates based on external environment information, a selector (103) that selects input data to convolution operation units (107-1 to L) from external environment information; convolution operation units (107-1 to L) configured by a plurality of layers connected in cascade; and a parameter storage unit (109) that stores a weight parameter of each layer, a cumulative addition count of each layer, and an omitting bit number of each layer. The recognition device includes operation stop signal generation units (116-1 to L) that transmit one or more stop signals for stopping some or all of the computing units of the convolution operation units (107-1 to L) to the convolution operation units (107-1 to L) for each layer.
Method and apparatus with bit-serial data processing of a neural network
A processor-implemented data processing method includes encoding a plurality of weights of a filter of a neural network using an inverted two's complement fixed-point format; generating weight data based on values of the encoded weights corresponding to same filter positions of a plurality of filters; and performing an operation on the weight data and input activation data using a bit-serial scheme to control when to perform an activation function with respect to the weight data and input activation data.