G06F7/483

Integrated circuit chip apparatus

Provided are an integrated circuit chip apparatus and a related product, the integrated circuit chip apparatus being used for executing a multiplication operation, a convolution operation or a training operation of a neural network. The present technical solution has the advantages of a small amount of calculation and low power consumption.

Integrated circuit chip apparatus

Provided are an integrated circuit chip apparatus and a related product, the integrated circuit chip apparatus being used for executing a multiplication operation, a convolution operation or a training operation of a neural network. The present technical solution has the advantages of a small amount of calculation and low power consumption.

COMPRESSED WALLACE TREES IN FMA CIRCUITS

An embodiment of an apparatus comprises one or more fractional width fused multiply-accumulate (FMA) circuits configured as a shared Wallace tree, and circuitry coupled to the one or more fractional width FMA circuits to provide one or more fractional width FMA operations through the one or more fractional width FMA circuits. Other embodiments are disclosed and claimed.

DATATYPE CONVERSION TECHNIQUE

Apparatuses, systems, and techniques to generate numbers. In at least one embodiment, one or more circuits are to cause one or more thirty-two bit floating point numbers to be truncated to generate one or more rounded numbers based, at least in part, on one or more rounding attributes.

INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

Apparatus and method for converting a floating-point value from half precision to single precision

An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.

Numeric Operations on Logarithmically Encoded Data Values
20220357919 · 2022-11-10 ·

Numeric operations on a series of values, each encoded as a logarithm, may be performed. A first offset is determined based on values of an initial subset of the series. Numerical operations on individual elements of the series may then be performed by performing an exponentiation of a difference between the individual elements and the first offset. A particular element in the series may be identified as being unable to be included without causing an overflow or underflow of an exponentiation or numerical operation. In this event, a second offset may be determined. The numerical operations may then proceed by adjusting a current accumulated result using the second offset and continuing to perform numerical operations on individual elements of the series by performing an exponentiation of a difference between the individual elements and the second offset. Additional offsets may be optionally determined until the numerical operation is complete.

METHOD AND APPARATUS WITH FLOATING POINT PROCESSING
20230042954 · 2023-02-09 · ·

A processor-implemented includes receiving a first floating point operand and a second floating point operand, each having an n-bit format comprising a sign field, an exponent field, and a significand field, normalizing a binary value obtained by performing arithmetic operations for fields corresponding to each other in the first and second floating point operands for an n-bit multiplication operation, determining whether the normalized binary value is a number that is representable in the n-bit format or an extended normal number that is not representable in the n-bit format, according to a result of the determining, encoding the normalized binary value using an extension bit format in which an extension pin identifying whether the normalized binary value is the extended normal number is added to the n-bit format, and outputting the encoded binary value using the extended bit format, as a result of the n-bit multiplication operation.

FPGA specialist processing block for machine learning

The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.