G06F7/49936

COMPUTING ACCELERATOR USING A LOOKUP TABLE
20200334012 · 2020-10-22 ·

A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.

USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS
20200310757 · 2020-10-01 ·

Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Look Ahead Normaliser
20240012613 · 2024-01-11 ·

Apparatus includes hardware logic arranged to normalise an n-bit input number. The hardware logic comprises at least a first hardware logic stage, an intermediate hardware logic stage and a final hardware logic stage. Each stage comprises a left shifting logic element, the first and intermediate stages each also comprise a plurality of OR-reduction logic elements and the intermediate and final stages each also comprise one or more multiplexers. The OR-reduction logic elements operate on different subsets of bits from the number input to the particular stage. In the intermediate and final hardware logic stages, a first of the multiplexers selects an OR-reduction result received from a previous hardware logic stage and the left shifting logic element is arranged to perform left shifting on the updated binary number received from an immediately previous hardware logic stage dependent upon the selected OR-reduction result.

Transcendental calculation unit apparatus and method
10761806 · 2020-09-01 ·

A Transcendental Calculation Unit includes a Configuration Table storing a set of constants and provide a selected one of the constants, a Power Series Multiplier that iteratively develops a power series, a Coefficient Series Multiplier and Accumulator that develops an accumulated product of the power series and the constant, and a Round and Normalize Stage that rounds the accumulated product and normalizes rounded product.

Providing efficient floating-point operations using matrix processors in processor-based systems

Providing efficient floating-point operations using matrix processors in processor-based systems is disclosed. In this regard, a matrix-processor-based device provides a matrix processor comprising a positive partial sum accumulator and a negative partial sum accumulator. As the matrix processor processes pairs of floating-point operands, the matrix processor calculates an intermediate product based on a first floating-point operand and a second floating-point operand and determines a sign of the intermediate product. Based on the sign, the matrix processor normalizes the intermediate product with a partial sum fraction of the positive partial sum accumulator or the negative partial sum accumulator, then adds the intermediate product to the positive sum accumulator or the negative sum accumulator. After processing all pairs of floating-point operands, the matrix processor subtracts the negative partial sum accumulator from the positive partial sum accumulator to generate a final sum, then renormalizes the final sum a single time.

Computing accelerator using a lookup table

A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.

Apparatuses for integrating arithmetic with logic operations

An apparatus integrates arithmetic with logic operations. The apparatus includes a calculation device that calculates source data to generate and output first destination data. The apparatus further includes a normalization unit, coupled to the calculation device, that normalizes the first destination data to generate second destination data of a first type when receiving a signal indicating an output of first-type data, and normalizing the first destination data to generate the second destination data of a second type when receiving the signal indicating an output of second-type data.

Denormalization in multi-precision floating-point arithmetic circuitry
10678510 · 2020-06-09 · ·

The present embodiments relate to integrated circuits with floating-point arithmetic circuitry that handles normalized and denormalized floating-point numbers. The floating-point arithmetic circuitry may include a normalization circuit and a rounding circuit, and the floating-point arithmetic circuitry may generate a first result in form of a normalized, unrounded floating-point number and a second result in form of a normalized, rounded floating-point number. If desired, the floating-point arithmetic circuitry may be implemented in specialized processing blocks.

Methods and apparatus for performing fixed-point normalization using floating-point functional blocks
10671345 · 2020-06-02 · ·

An integrated circuit may include normalization circuitry that can be used when converting a fixed-point number to a floating-point number. The normalization circuitry may include at least a floating-point generation circuit that receives the fixed-point number and that creates a corresponding floating-point number. The normalization circuitry may then leverage an embedded digital signal processing (DSP) block on the integrated circuit to perform an arithmetic operation by removing the leading one from the created floating-point number. The resulting number may have a fractional component and an exponent value, which can then be used to derive the final normalized value.

DISTRIBUTED BATCH NORMALIZATION USING PARTIAL POPULATIONS
20200160112 · 2020-05-21 ·

A technique for performing data parallel training of a neural network model is disclosed that incorporates batch normalization techniques using partial populations to generate normalization parameters. The technique involves processing, by each processor of a plurality of processors in parallel, a first portion of a sub-batch of training samples allocated to the processor to generate activations for the first portion of the sub-batch. Each processor analyzes the activations and transmits statistical measures for the first portion to an additional processor that reduces the statistical measures from multiple processors to generate normalization parameters for a partial population of the training samples that includes the first portion from each of the plurality of processors. The normalization parameters are then transmitted back to each of the processors to normalize the activations for both the first portion and a second portion of the sub-batch of training samples allocated to each processor.