Patent classifications
G06F7/4836
Processor for fine-grain sparse integer and floating-point operations
A processor for fine-grain sparse integer and floating-point operations and method of operation thereof are provided. In some embodiments, the method includes forming a first set of products and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a first activation value by a least significant sub-word and a most significant sub-word of a first weight to form a first partial product and a second partial product; and adding the first partial product and the second partial product. The forming of the second set of products may include: multiplying, in the first multiplier, a second activation value by a first sub-word and a second sub-word of a mantissa to form a third partial product and a fourth partial product; and adding the third partial product and the fourth partial product.
Error allocation format selection for hardware implementation of deep neural network
Methods for determining a fixed point format for one or more layers of a DNN based on the portion of the output error of the DNN attributed to the fixed point formats of the different layers. Specifically, in the methods described herein the output error of a DNN attributable to the quantisation of the weights or input data values of each layer is determined using a Taylor approximation and the fixed point number format of one or more layers is adjusted based on the attribution. For example, where the fixed point number formats used by a DNN comprises an exponent and a mantissa bit length, the mantissa bit length of the layer allocated the lowest portion of the output error may be reduced, or the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN.
Method, apparatus, and computer program stored in computer readable medium for conducting arithmetic operation efficiently in database management server
Provided are a method, an apparatus, and a computer program stored in a computer readable medium for conducting an arithmetic operation efficiently in a database management server. In a computer-readable medium including a computer program including encoded commands, which is configured to cause one or more processors to perform operations when the computer program is executed by the one or more processors of a computer system, the operations include: an operation of receiving a structure body creation request for performing a predetermined arithmetic operation; an operation of creating a structure body in response to the structure body creation request; an operation of receiving an arithmetic operation processing request of requesting processing of the predetermined arithmetic operation with respect to a plurality of numerical values; an operation of creating structure body number data for each of the plurality of numerical values by applying each of the plurality of numerical values to the created structure body, the created structure body including one or more array elements and at least some numerical values being allocated to the one or more array elements to create the structure body number data; and an operation of performing the predetermined arithmetic operation based on the structure body number data for each of the plurality of numerical values.
USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS
Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.
INTERLEAVED PIPELINE OF FLOATING-POINT ADDERS
Disclosed embodiments relate to an interleaved pipeline of floating-point (FP) adders. In one example, a processor is to execute an instruction specifying an opcode and locations of a M by K first source matrix, a K by N second source matrix, and a M by N destination matrix, the opcode indicating execution circuitry, for each FP element (M, N) of the destination matrix, is to: launch K instances of a pipeline having a first, MULTIPLY stage, during which a FP element (M, K) of the first source matrix and a corresponding FP element (K, N) of the second source matrix are multiplied; concurrently, in an EXPDIFF stage, determine an exponent difference between the product and a previous FP value of the element (M, N) of the destination matrix; and in a second, ADD-BYPASS stage, accumulate the product with the previous FP value and, concurrently, bypassing the accumulated sum to a subsequent pipeline instance.
Method, apparatus, and computer program stored in computer readable medium for conducting arithmetic operation efficiently in database management server
Provided are a method, an apparatus, and a computer program stored in a computer readable medium for conducting an arithmetic operation efficiently in a database management server. In a computer-readable medium including a computer program including encoded commands, which is configured to cause one or more processors to perform operations when the computer program is executed by the one or more processors of a computer system, the operations include: an operation of receiving a structure body creation request for performing a predetermined arithmetic operation; an operation of creating a structure body in response to the structure body creation request; an operation of receiving an arithmetic operation processing request of requesting processing of the predetermined arithmetic operation with respect to a plurality of numerical values; an operation of creating structure body number data for each of the plurality of numerical values by applying each of the plurality of numerical values to the created structure body, the created structure body including one or more array elements and at least some numerical values being allocated to the one or more array elements to create the structure body number data; and an operation of performing the predetermined arithmetic operation based on the structure body number data for each of the plurality of numerical values.
Hierarchical Mantissa Bit Length Selection for Hardware Implementation of Deep Neural Network
Hierarchical methods for selecting fixed point number formats with reduced mantissa bit lengths for representing values input to, and/or output, from, the layers of a DNN. The methods begin with one or more initial fixed point number formats for each layer. The layers are divided into subsets of layers and the mantissa bit lengths of the fixed point number formats are iteratively reduced from the initial fixed point number formats on a per subset basis. If a reduction causes the output error of the DNN to exceed an error threshold, then the reduction is discarded, and no more reductions are made to the layers of the subset. Otherwise a further reduction is made to the fixed point number formats for the layers in that subset. Once no further reductions can be made to any of the subsets the method is repeated for continually increasing numbers of subsets until a predetermined number of layers per subset is achieved.
Error Allocation Format Selection for Hardware Implementation of Deep Neural Network
Methods for determining a fixed point format for one or more layers of a DNN based on the portion of the output error of the DNN attributed to the fixed point formats of the different layers. Specifically, in the methods described herein the output error of a DNN attributable to the quantisation of the weights or input data values of each layer is determined using a Taylor approximation and the fixed point number format of one or more layers is adjusted based on the attribution. For example, where the fixed point number formats used by a DNN comprises an exponent and a mantissa bit length, the mantissa bit length of the layer allocated the lowest portion of the output error may be reduced, or the mantissa bit length of the layer allocated the highest portion of the output error may be increased. Such a method may be iteratively repeated to determine an optimum set of fixed point number formats for the layers of a DNN.
Floating point chained multiply accumulate
Floating point chained multiply accumulation is performed using a multiplier to multiply a first floating point operand by a second floating point operand to generate an unrounded multiplication result. An adder then adds a third floating point operand to the unrounded multiplication result to generate an unrounded accumulation result. Rounding circuitry then applies both the rounding associated with the unrounded multiplication result and rounding associated with the unrounded accumulation result to generate a rounded accumulation result.
METHOD, APPARATUS, AND COMPUTER PROGRAM STORED IN COMPUTER READABLE MEDIUM FOR CONDUCTING ARITHMETIC OPERATION EFFICIENTLY IN DATABASE MANAGEMENT SERVER
Provided are a method, an apparatus, and a computer program stored in a computer readable medium for conducting an arithmetic operation efficiently in a database management server. In a computer-readable medium including a computer program including encoded commands, which is configured to cause one or more processors to perform operations when the computer program is executed by the one or more processors of a computer system, the operations include: an operation of receiving a structure body creation request for performing a predetermined arithmetic operation; an operation of creating a structure body in response to the structure body creation request; an operation of receiving an arithmetic operation processing request of requesting processing of the predetermined arithmetic operation with respect to a plurality of numerical values; an operation of creating structure body number data for each of the plurality of numerical values by applying each of the plurality of numerical values to the created structure body, the created structure body including one or more array elements and at least some numerical values being allocated to the one or more array elements to create the structure body number data; and an operation of performing the predetermined arithmetic operation based on the structure body number data for each of the plurality of numerical values.