Patent classifications
G06F7/5332
Radix-4 multiplier partial product generation with improved area and power
A system including a series of partial product select encoders and partial product muxes, each of the partial product select encoders receiving a multiplier, receiving a carry input from a multiplier tree, and outputting a select signal to an associated partial product mux based on the multiplier and carry input, and each of the partial product muxes outputting a partial product based on the select signal and a multiplicand received.
Neural processing accelerator
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.
Multiply-accumulate “0” data gating
In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
MICROPROCESSOR WITH BOOTH MULTIPLICATION
A microprocessor with Booth multiplication, in which several acquisition registers are used. In a first word length, a first acquisition register stores an unsigned ending acquisition of a first multiplier number carried in multiplier number supply data, and a third acquisition register stores a starting acquisition of a second multiplier number carried in the multiplier number supply data. In a second word length that is longer than the first word length, a fourth acquisition register stores a middle acquisition of a third multiplier number carried in the multiplier number supply data. A partial product selection circuit is required for selection of a partial product, to get the partial product from Booth multiplication based on the third acquisition register (corresponding to the first word length) or based on the fourth acquisition register (corresponding to the second word length).
MICROPROCESSOR FOR NEURAL NETWORK COMPUTING AND PROCESSING METHOD OF MACROINSTRUCTION
A microprocessor for neural network computing having a mapping table, a microcode memory, and a microcode decoding finite-state machine (FSM) is disclosed. According to the mapping table, a macroinstruction is mapped to an address on the microcode memory. The microcode decoding FSM decodes contents which are retrieved from the microcode memory according to the address, to get microinstructions involving at least one microinstruction loop that is repeated to operate a datapath to complete the macroinstruction.
MICROPROCESSOR WITH BOOTH MULTIPLICATION
A microprocessor provides at least two storage areas and uses a datapath for Booth multiplication. According to a first and second field of a microinstruction, the datapath gets multiplicand number supply data from the first storage area and multiplier number supply data from the second storage area. The datapath operates according to a word length indicated in a third field of the microinstruction. The datapath gets multi-bit acquisitions for Booth multiplication from the multiplier number supply data. The datapath divides the multiplicand number supply data into multiplicand numbers according to the word length, and performs Booth multiplication on the multiplicand numbers based on the multi-bit acquisitions to get partial products. According to the word length, the datapath selects a part of the partial products to be shifted and added for generation of a plurality of products.
MICROPROCESSOR WITH DYNAMICALLY ADJUSTABLE BIT WIDTH FOR PROCESSING DATA
A microprocessor with dynamically adjustable bit width is provided, which has a bit width register, a datapath, a statistical register, and a bit width adjuster. The bit width register stores at least one bit width. The datapath operates according to the bit width stored in the bit width register to acquire input operands from received data and process input operands. The statistical register collects calculation results of the datapath. The bit width adjuster adjusts the bit width stored in the bit width register based on the calculation results collected in the statistical register.
NEURAL PROCESSING ACCELERATOR
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.
MULTIPLY-ACCUMULATE "0" DATA GATING
In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
System and method for implementing a multiplication
The present system and method relate to a system for performing a multiplication. The system is arranged for receiving a first data value, and comprises means for calculating at run time a set of instructions for performing a multiplication using the first data value, storage means for storing the set of instructions calculated at run time, multiplication means arranged for receiving a second data value and at least one instruction from the stored set of instructions and arranged for performing multiplication of the first and the second data values using the at least one instruction.