G06F7/4836

SYSTEM AND METHOD PERFORMING FLOATING-POINT OPERATIONS
20230161555 · 2023-05-25 ·

A method performing floating-point operations may include; obtaining operands having a floating-point format, calculating a gain based on a range of exponents for the operands, generating intermediate values having a fixed-point format by applying the gain to the operands, generating a fixed-point result value having the fixed-point format by performing an operation on the intermediate values, and transforming the fixed-point result value into a floating-point output value having the floating-point format.

Error allocation format selection for hardware implementation of deep neural network
11734553 · 2023-08-22 · ·

Methods for determining a fixed point format for one or more layers of a DNN based on the portion of the output error of the DNN attributed to the fixed point formats of the different layers. Specifically, in the methods described herein the output error of a DNN attributable to the quantisation of the weights or input data values of each layer is determined using a Taylor approximation and the fixed point number format of one or more layers is adjusted based on the attribution. For example, where the fixed point number formats used by a DNN comprises an exponent and a mantissa bit length, the mantissa bit length of the layer allocated the lowest portion of the output error may be reduced, or the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN.

PROCESSOR FOR FINE-GRAIN SPARSE INTEGER AND FLOATING-POINT OPERATIONS
20220147312 · 2022-05-12 ·

A processor for fine-grain sparse integer and floating-point operations and method of operation thereof. In some embodiments, the method includes forming a first set of products and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a first activation value by a least significant sub-word and a most significant sub-word of a first weight form a first partial product and a second partial product; and adding the first partial product and the second partial product. The forming of the second set of products may include: multiplying, in the first multiplier, a second activation value by a first sub-word and a second sub-word of a mantissa to form a third partial product and a fourth partial product; and adding the third partial product and the fourth partial product.

ROUNDING HEXADECIMAL FLOATING POINT NUMBERS USING BINARY INCREMENTORS

Rounding hexadecimal floating point numbers using binary incrementors, including: incrementing, by a first incrementor, a first subset of bits of an operand comprising a binary hexadecimal floating point operand; incrementing, by a second incrementor, a second subset of bits of the operand; generate an intermediate result based on a carryout of the second incrementor; and generate an incremented result based on a carryout of the first incrementor and one or more of: a first bit of the intermediate result or the carryout of the second incrementor.

Error Allocation Format Selection for Hardware Implementation of Deep Neural Network
20220327366 · 2022-10-13 ·

Methods for determining a fixed point format for one or more layers of a DNN based on the portion of the output error of the DNN attributed to the fixed point formats of the different layers. Specifically, in the methods described herein the output error of a DNN attributable to the quantisation of the weights or input data values of each layer is determined using a Taylor approximation and the fixed point number format of one or more layers is adjusted based on the attribution. For example, where the fixed point number formats used by a DNN comprises an exponent and a mantissa bit length, the mantissa bit length of the layer allocated the lowest portion of the output error may be reduced, or the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN.

Error allocation format selection for hardware implementation of deep neural network
11392823 · 2022-07-19 · ·

Methods for determining a fixed point format for one or more layers of a DNN based on the portion of the output error of the DNN attributed to the fixed point formats of the different layers. Specifically, in the methods described herein the output error of a DNN attributable to the quantisation of the weights or input data values of each layer is determined using a Taylor approximation and the fixed point number format of one or more layers is adjusted based on the attribution. For example, where the fixed point number formats used by a DNN comprises an exponent and a mantissa bit length, the mantissa bit length of the layer allocated the lowest portion of the output error may be reduced, or the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN.

Interleaved pipeline of floating-point adders

Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.

USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS
20210279038 · 2021-09-09 ·

Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

Error Allocation Format Selection for Hardware Implementation of Deep Neural Network
20210125047 · 2021-04-29 ·

Methods for determining a fixed point format for one or more layers of a DNN based on the portion of the output error of the DNN attributed to the fixed point formats of the different layers. Specifically, in the methods described herein the output error of a DNN attributable to the quantisation of the weights or input data values of each layer is determined using a Taylor approximation and the fixed point number format of one or more layers is adjusted based on the attribution. For example, where the fixed point number formats used by a DNN comprises an exponent and a mantissa bit length, the mantissa bit length of the layer allocated the lowest portion of the output error may be reduced, or the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN.

METHOD, APPARATUS, AND COMPUTER PROGRAM STORED IN COMPUTER READABLE MEDIUM FOR CONDUCTING ARITHMETIC OPERATION EFFICIENTLY IN DATABASE MANAGEMENT SERVER
20210081384 · 2021-03-18 ·

Provided are a method, an apparatus, and a computer program stored in a computer readable medium for conducting an arithmetic operation efficiently in a database management server. In a computer-readable medium including a computer program including encoded commands, which is configured to cause one or more processors to perform operations when the computer program is executed by the one or more processors of a computer system, the operations include: an operation of receiving a structure body creation request for performing a predetermined arithmetic operation; an operation of creating a structure body in response to the structure body creation request; an operation of receiving an arithmetic operation processing request of requesting processing of the predetermined arithmetic operation with respect to a plurality of numerical values; an operation of creating structure body number data for each of the plurality of numerical values by applying each of the plurality of numerical values to the created structure body, the created structure body including one or more array elements and at least some numerical values being allocated to the one or more array elements to create the structure body number data; and an operation of performing the predetermined arithmetic operation based on the structure body number data for each of the plurality of numerical values.