Patent classifications
G06F7/4833
Unified multifunction circuitry
One embodiment provides a unified multifunction circuitry. The unified multifunction circuitry includes a logarithm circuitry and an antilogarithm circuitry. The logarithm circuitry is to determine a log output operand. The log output operand includes a piecewise linear approximation of a base 2 logarithm of a significand of a log input operand. The antilogarithm circuitry is to determine an antilog output operand. The antilog output operand includes a piecewise linear approximation of a base 2 antilogarithm of a fraction of a selected input operand.
Neural network accelerator using logarithmic-based arithmetic
Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
POWER SERIES TRUNCATION USING CONSTANT TABLES FOR FUNCTION INTERPOLATION IN TRANSCENDENTAL FUNCTIONS
A primary interval for convergence of at least one power series in a transcendental function is interpolated, while selecting a number of one or more interpolation points for a truncated expansion of the at least one power series by a selected order of truncation. A function and at least one derivative of the function of the truncated expansion of the selected order of truncation is evaluated at the one or more interpolation points. Each separate value evaluated for the function and each of the at least one derivative is saved in a table, wherein the table is looked up for efficiently computing a result of the truncated expansion of the at least one power series.
PROCESSING WITH COMPACT ARITHMETIC PROCESSING ELEMENT
A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (LPHDR arithmetic). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
Mechanism to perform single precision floating point extended math operations
A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.
FRACTIONAL LOGARITHMIC NUMBER SYSTEM ADDER
An adder for fractional logarithmic number system (FLNS) format operands includes a compare-and-swap circuit that inputs first and second FLNS operands represented by fixed point values and provides a greater one as operand x and a lesser or equal one as operand y. Sign bits are s.sub.x and s.sub.y of x and y, respectively, q.sub.x and q.sub.y, are integer portions of x and y, respectively, fraction portions of x and y have integer values r.sub.x and r.sub.y, respectively. The compare-and-swap circuit is configured to provide s.sub.x as a sign bit, s.sub.z of a sum z=x(1+y/x) for x0. A subtraction circuit subtracts (q.sub.y+r.sub.y/n)(q.sub.x+r.sub.x/n) and outputs q.sub. and r.sub., such that =y/x, where n=2.sup.w.sup.
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC
Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
Implementing logarithmic and antilogarithmic operations based on piecewise linear approximation
Implementations of the disclosure provide logarithm and anti-logarithm operations on a hardware processor based on linear piecewise approximation. An example processor includes a piece wise linear log approximation circuit that receives an input of a floating-point number comprising a sign, an exponent and a mantissa. The piece wise linear log approximation circuit approximates a fractional portion of a fixed point number using a linear approximation of the mantissa of the floating-point number. The piece wise linear log approximation circuit also derives an integer from the exponent.
Constant depth, near constant depth, and subcubic size threshold circuits for linear algebraic calculations
A method of increasing an efficiency at which a plurality of threshold gates arranged as neuromorphic hardware is able to perform a linear algebraic calculation having a dominant size of N. The computer-implemented method includes using the plurality of threshold gates to perform the linear algebraic calculation in a manner that is simultaneously efficient and at a near constant depth. Efficient is defined as a calculation algorithm that uses fewer of the plurality of threshold gates than a nave algorithm. The nave algorithm is a straightforward algorithm for solving the linear algebraic calculation. Constant depth is defined as an algorithm that has an execution time that is independent of a size of an input to the linear algebraic calculation. The near constant depth comprises a computing depth equal to or between O(log(log(N)) and the constant depth.
Method of Neural Network Training Using Floating-Point Signed Digit Representation
A method of training a neural network including multiple neural network weights and multiple neurons, and the method includes using floating-point signed digit numbers to represent each of the multiple neural network weights, wherein a mantissa of each of the multiple neural network weights is represented by multiple mantissa signed digit groups and an exponent of each of the multiple neural network weights is represented by an exponent digit group; and using the exponent digit group and at least one of the multiple mantissa signed digit groups to perform weight adjustment computation and neural network inference computation.