Patent classifications
G06F7/4988
MULTIPLICATION
A device includes a memory, which, in operation, stores one or more look-up tables, and cryptographic circuitry coupled to the memory. The cryptographic circuitry, in operation, multiplies first data masked with a first mask by second data masked with a second mask, and protects the first data and the second data during the multiplying. The multiplying and protecting includes remasking the first data with a third mask, remasking the second data with a fourth mask, executing one or more compensation operations using one or more of the one or more look-up tables, and generating third data masked with a fifth mask. The fifth mask is independent of the first, second, third, and fourth masks. The third data corresponds to the first data multiplied by the second data.
Configurable Processor with In-Package Look-Up Table
The present invention discloses a configurable processor with an in-package look-up table. The configurable processor comprises a programmable memory die and a logic die located in a same package. The programmable memory die comprises a look-up table circuit (LUT) for storing data related to a desired function. The logic die comprises an arithmetic logic circuit (ALC) for performing arithmetic operations on the data read out from the LUT.
Load exploitation and improved pipelineability of hardware instructions
A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.
METHOD AND APPARATUS WITH DATA PROCESSING
A processor-implemented data processing method includes: normalizing input data of an activation function comprising a division operation; determining dividend data corresponding to a dividend of the division operation by reading, from a memory, a value of a first lookup table addressed by the normalized input data; determining divisor data corresponding to a divisor of the division operation by accumulating the dividend data; and determining output data of the activation function corresponding to an output of the division operation obtained by reading, from the memory, a value of a second lookup table addressed by the dividend data and the divisor data.
Residual quantization of bit-shift weights in an artificial neural network
A neural network accelerator reads encoded weights from memory. All 1 bits in a weight except the first three are discarded. The first three leading 1 bits in the weight are encoded as three bit-shift values to form the encoded weight. The three bit-shift values are applied to a bit shifter to shift a node input to obtain three shifted inputs that are accumulated to generate the node output. Node complexity is reduced since only 3 shifts are performed rather than up to 15 shifts for a 16-bit weight. The bit shifter and accumulator for a node can be implemented by Look-Up Tables (LUTs) without requiring a Multiply-Accumulate (MAC) cell in a Field-Programmable Gate Array (FPGA). Quantization bias is reduced using a histogram analyzer that determines a weighted average for each interval between quantized weights. The third bit-shift value is incremented for weights in the interval above the weighted average.
LOAD EXPLOITATION AND IMPROVED PIPELINEABILITY OF HARDWARE INSTRUCTIONS
A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.
MECHANISM TO PERFORM SINGLE PRECISION FLOATING POINT EXTENDED MATH OPERATIONS
A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.
DEEP NEURAL NETWORK ACCELERATOR INCLUDING LOOKUP TABLE BASED BIT-SERIAL PROCESSING ELEMENTS
A deep neural network accelerator includes a feature loader that stores input features, a weight memory that stores a weight, and a processing element. The processing element applies 1-bit weight values to the input features to generate results according to the 1-bit weight values, receives a target weight corresponding to the input features from the weight memory, and selects a target result corresponding to the received target weight from among the results to generate output features.
LOAD EXPLOITATION AND IMPROVED PIPELINEABILITY OF HARDWARE INSTRUCTIONS
A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.
Mechanism to perform single precision floating point extended math operations
A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.