Patent classifications
G06F7/49936
Fused Multiply-Add operator for mixed precision floating-point numbers with correct rounding
A fused multiply-add hardware operator comprising a multiplier receiving two multiplicands as floating-point numbers encoded in a first precision format; an alignment circuit associated with the multiplier configured to convert the result of the multiplication into a first fixed-point number; and an adder configured to add the first fixed-point number and an addition operand. The addition operand is a floating-point number encoded in a second precision format, and the operator comprises an alignment circuit associated with the addition operand, configured to convert the addition operand into a second fixed-point number of reduced dynamic range relative to the dynamic range of the addition operand, having a number of bits equal to the number of bits of the first fixed-point number, extended on both sides by at least the size of the mantissa of the addition operand; the adder configured to add the first and second fixed-point numbers without loss.
Quantizing machine learning models with balanced resolution via damped encoding
A method for quantizing a machine learning model during an inference phase, including determining a normalization factor using a set of floating-point values and a damped value of a damped value sequence; and assigning a quantized value for each floating-point value of the set of floating-point values based on the damped value sequence and the normalization factor.
Systolic array including fused multiply accumulate with efficient prenormalization and extended dynamic range
Systems and methods are provided to perform multiply-accumulate operations of normalized numbers in a systolic array to enable greater computational density, reduce the size of systolic arrays required to perform multiply-accumulate operations of normalized numbers, and/or enable higher throughput operation. The systolic array can be provided normalized numbers by a column of normalizers and can lack support for denormal numbers. Each normalizer can normalize the inputs to each processing element in the systolic array. The systolic array can include a multiplier and an adder. The multiplier can have multiple data paths that correspond to the data type of the input. The multiplier and adder can employ expanded exponent range to operate on normalized floating-point numbers and can lack support for denormal numbers.
Look Ahead Normaliser
Apparatus includes hardware logic arranged to normalise an n-bit input number. The hardware logic comprises at least a first hardware logic stage, an intermediate hardware logic stage and a final hardware logic stage. Each stage comprises a left shifting logic element, the first and intermediate stages each also comprise a plurality of OR-reduction logic elements and the intermediate and final stages each also comprise one or more multiplexers. The OR-reduction logic elements operate on different subsets of bits from the number input to the particular stage. In the intermediate and final hardware logic stages, a first of the multiplexers selects an OR-reduction result received from a previous hardware logic stage and the left shifting logic element is arranged to perform left shifting on the updated binary number received from an immediately previous hardware logic stage dependent upon the selected OR-reduction result.
High performance floating-point adder with full in-line denormal/subnormal support
According to one general aspect, an apparatus may include a floating-point addition unit that includes a far path circuit, a close path circuit, and a final result selector circuit. The far path circuit may be configured to compute a far path result based upon either the addition or the subtraction of the two floating point numbers regardless of whether the operands or the result include normal or denormal numbers. The close path circuit may be configured to compute a close path result based upon the subtraction of the two floating point operands regardless of whether the operands or the result include normal or denormal numbers. The final result selector circuit may be configured to select between the far path result and the close path result based, at least in part, upon an amount of difference in the exponent portions of the two floating point operands.
Computing accelerator using a lookup table
A computing accelerator using a lookup table. The accelerator may accelerate floating point multiplications by retrieving the fraction portion of the product of two floating-point operands from a lookup table, or by retrieving the product of two floating-point operands of two floating-point operands from a lookup table, or it may retrieve dot products of floating point vectors from a lookup table. The accelerator may be implemented in a three-dimensional memory assembly. It may use approximation, the symmetry of a multiplication lookup table, and zero-skipping to improve performance.
METHOD AND SYSTEM FOR PROCESSING FLOATING POINT NUMBERS
A method and system for processing a set of ‘k’ floating point numbers to perform addition and/or subtraction is disclosed. Each floating-point number comprises a mantissa (m.sub.i) and an exponent (e.sub.i). The method comprises receiving the set of ‘k’ floating point numbers in a first format, each floating-point number in the first format comprising a mantissa (m.sub.i) with a bit-length of ‘b’ bits. The method further comprises creating a set of ‘k’ numbers (y.sub.i) based on the mantissas of the ‘k’ floating-point numbers, the numbers having a bit-length of ‘n’ bits obtained by adding both extra most-significant bits and extra least-significant bits to the bit length ‘b’ of the mantissa (m.sub.i). The method includes identifying a maximum exponent (e.sub.max) among the exponents e.sub.i, aligning the magnitude bits of the numbers (y.sub.i) based on the maximum exponent (e.sub.max) and processing the set of ‘k’ numbers concurrently.
Apparatus and method of processing numeric calculation
A method and apparatus for processing numeric calculation are provided. The method includes determining a shift bit and an index bit that falls within an index range of a lookup table from among bits representing a divisor scaled up by an offset, obtaining a replacement value corresponding to an index value of the determined index bit by using the lookup table, multiplying a dividend scaled up by the offset by the obtained replacement value, and outputting a value corresponding to a division operation by correcting a scale of a result of the multiplication using a right shift operation.
Distributed batch normalization using partial populations
A technique for performing data parallel training of a neural network model is disclosed that incorporates batch normalization techniques using partial populations to generate normalization parameters. The technique involves processing, by each processor of a plurality of processors in parallel, a first portion of a sub-batch of training samples allocated to the processor to generate activations for the first portion of the sub-batch. Each processor analyzes the activations and transmits statistical measures for the first portion to an additional processor that reduces the statistical measures from multiple processors to generate normalization parameters for a partial population of the training samples that includes the first portion from each of the plurality of processors. The normalization parameters are then transmitted back to each of the processors to normalize the activations for both the first portion and a second portion of the sub-batch of training samples allocated to each processor.
Computational units for batch normalization
Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements x.sub.i having a first format, to produce a mean μ.sub.1 in the first format and a mean μ.sub.2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements x.sub.i to produce a standard deviation σ for the batch using the mean μ.sub.1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements x.sub.i to produce a normalized set of values y.sub.i using the mean μ.sub.2 in the second format and the standard deviation σ.