Patent classifications
G06F7/556
MATRIX COMPUTING METHOD AND RELATED DEVICE
The present disclosure relates to matrix computing methods, chips, devices, and systems. One example method includes obtaining a computing instruction. The to-be-computed matrix is disassembled to obtain a plurality of disassembled matrices. Precision of a floating point number in the disassembled matrix is lower than precision of a floating point number in the to-be-computed matrix. Computing processing is performed on the plurality of disassembled matrices based on the matrix computing type.
OPERATING METHOD OF FLOATING POINT OPERATION CIRCUIT AND INTEGRATED CIRCUIT INCLUDING FLOATING POINT OPERATION CIRCUIT
An operating method of a floating point operation circuit includes, in response to receiving a first instruction, generating a first output by performing a fused multiplication and addition operation on a first input, a second input, and a third input. The method further includes, in response to receiving a second instruction, generating a second output by inverting one input of a fourth input, a fifth input, and a sixth input. Generating the second output includes generating a transform factor and a simplified value from the one input.
OPERATING METHOD OF FLOATING POINT OPERATION CIRCUIT AND INTEGRATED CIRCUIT INCLUDING FLOATING POINT OPERATION CIRCUIT
An operating method of a floating point operation circuit includes, in response to receiving a first instruction, generating a first output by performing a fused multiplication and addition operation on a first input, a second input, and a third input. The method further includes, in response to receiving a second instruction, generating a second output by inverting one input of a fourth input, a fifth input, and a sixth input. Generating the second output includes generating a transform factor and a simplified value from the one input.
PARTIAL SUM COMPRESSION
A method for performing a neural network operation. In some embodiments, method includes: calculating a first plurality of products, each of the first plurality of products being the product of a weight and an activation; calculating a first partial sum, the first partial sum being the sum of the products; and compressing the first partial sum to form a first compressed partial sum.
PARTIAL SUM COMPRESSION
A method for performing a neural network operation. In some embodiments, method includes: calculating a first plurality of products, each of the first plurality of products being the product of a weight and an activation; calculating a first partial sum, the first partial sum being the sum of the products; and compressing the first partial sum to form a first compressed partial sum.
Power Saving Floating Point Multiplier-Accumulator With a High Precision Accumulation Detection Mode
A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwith, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.
Power Saving Floating Point Multiplier-Accumulator With a High Precision Accumulation Detection Mode
A floating point multiplier-accumulator (MAC) multiplies and accumulates N pairs of floating point values using N MAC processors operating simultaneously, each pair of values comprising an input value and a coefficient value to be multiplied and accumulated. The pairs of floating point values are simultaneously processed by the plurality of MAC processors, each of which output a signed integer form fraction with a first bitwidth and a second bitwith, along with a maximum exponent. The first bitwidth signed integer form fractions are summed by an adder tree using the first bitwidth to form a first sum, and when an excess leading 0 condition is detected, a second adder tree operative on the second bitwidth integer form fractions forms a second sum. The first sum or second sum, along with the maximum exponent, is converted into floating point result.
Implementation of Softmax and Exponential in Hardware
Methods for implementing an exponential operation, and a softmax neural network layer, in neural network accelerator hardware, and a data processing system for implementing the exponential operation and a data processing system for implementing the softmax layer. The exponential operation or softmax layer is mapped to a plurality of elementary neural network operations, and the neural network accelerator hardware evaluates these operations, to produce the result of the operation or layer respectively.
Implementation of Softmax and Exponential in Hardware
Methods for implementing an exponential operation, and a softmax neural network layer, in neural network accelerator hardware, and a data processing system for implementing the exponential operation and a data processing system for implementing the softmax layer. The exponential operation or softmax layer is mapped to a plurality of elementary neural network operations, and the neural network accelerator hardware evaluates these operations, to produce the result of the operation or layer respectively.
System to perform unary functions using range-specific coefficient sets
A method comprising storing a plurality of entries, each entry of the plurality of entries associated with a portion of a range of input values, each entry of the plurality of entries comprising a set of coefficients defining a power series approximation; selecting first entry of the plurality of entries based on a determination that a floating point input value is within a portion of the range of input values that is associated with the first entry; and calculating an output value by evaluating the power series approximation defined by the set of coefficients of the first entry at the floating point input value.