G06F7/4873

Floating-point Division Circuitry with Subnormal Support
20240053960 · 2024-02-15 ·

Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.

Floating-point division alternative techniques

Techniques are disclosed relating to circuitry configured to perform reciprocal-based floating-point division. In some embodiments, floating-point circuitry includes reciprocal circuitry configured to generate a reciprocal of a divisor, multiplication circuitry configured to multiply the reciprocal results with a dividend, and circuitry configured to clear a least significant bit of an integer representation of the multiplication output to generate a modified multiplication output. The floating-point circuitry may be configured to convert the modified multiplication output to a representation using the first precision to generate a division output. In some embodiments, the refinement using the integer representation may provide correctly-rounded subnormal division results. The disclosed techniques may improve accuracy, reduce processing time, and/or reduce instructions needed for floating-point division, with little to no increase in chip area.

Float division by constant integer

A binary logic circuit for determining the ratio x/d where x is a variable integer input, the binary logic circuit comprising: a logarithmic tree of modulo units each configured to calculate x[a: b]mod d for respective block positions a and b in x where b>a with the numbering of block positions increasing from the most significant bit of x up to the least significant bit of x, the modulo units being arranged such that a subset of M?1 modulo units of the logarithmic tree provide x[0: m]mod d for all m?{1, M}, and, on the basis that any given modulo unit introduces a delay of 1: all of the modulo units are arranged in the logarithmic tree within a delay envelope of [log.sub.2 M]; and more than M?2.sup.u of the subset of modulo units are arranged at the maximal delay of [log.sub.2 M], where 2.sup.u is the power of 2 immediately smaller than M.

Circuitry and method for performing division
10353671 · 2019-07-16 · ·

A data processing apparatus comprises signal receiving circuitry to receive a signal corresponding to a divide instruction that identifies a dividend x and a divisor d. Processing circuitry performs, in response to said divide instruction, a radix-N division algorithm to generate a result value q=x/d, where N is an integer power of 2 and greater than 1. Said division algorithm comprises a plurality of iterations, each of said plurality of iterations being performed by quotient digit calculation circuitry to determine a quotient value of that iteration q[i+1] based on a remainder value of a previous iteration rem[i]; and remainder calculation circuitry to determine a remainder value of that iteration rem[i+1] based on said quotient value of that iteration q[i+1] and said remainder value of said previous iteration rem[i]. Result calculation circuitry derives said result value q based on each quotient value selected by said digit selection circuitry for each of said plurality of iterations. For at least some of said plurality of iterations, said quotient digit calculation circuitry speculatively determines a set of candidate values before a quotient value of said previous iteration is known and, in response to said quotient value of said previous iteration becoming known, determines said quotient value of that iteration q[i+1] based on one of said candidate values.

Masked decomposition of polynomials for lattice-based cryptography

Various implementations relate to a data processing system comprising instructions embodied in a non-transitory computer readable medium, the instructions for a cryptographic operation including a masked decomposition of a polynomial a having n.sub.s arithmetic shares into a high part a.sub.1 and a low part a.sub.0 for lattice-based cryptography in a processor, the instructions, including: performing a rounded Euclidian division of the polynomial a by a base ? to compute t.sup.(?)A; extracting Boolean shares a.sub.1.sup.(?)B from n low bits of t by performing an arithmetic share to Boolean share (A2B) conversion on t.sup.(?)A and performing an AND with ??1, where ?=??.sup.?1 is a power of 2; unmasking a.sub.1 by combining Boolean shares of a.sub.1.sup.(?)B; calculating arithmetic shares a.sub.0.sup.(?)A of the low part a.sub.0; and performing a cryptographic function using a.sub.1 and a.sub.0.sup.(?)A.

MECHANISM TO PERFORM SINGLE PRECISION FLOATING POINT EXTENDED MATH OPERATIONS

A processor to facilitate execution of a single-precision floating point operation on an operand is disclosed. The processor includes one or more execution units, each having a plurality of floating point units to execute one or more instructions to perform the single-precision floating point operation on the operand, including performing a floating point operation on an exponent component of the operand; and performing a floating point operation on a mantissa component of the operand, comprising dividing the mantissa component into a first sub-component and a second sub-component, determining a result of the floating point operation for the first sub-component and determining a result of the floating point operation for the second sub-component, and returning a result of the floating point operation.

Method and apparatus with data processing

A processor-implemented data processing method includes: normalizing input data of an activation function comprising a division operation; determining dividend data corresponding to a dividend of the division operation by reading, from a memory, a value of a first lookup table addressed by the normalized input data; determining divisor data corresponding to a divisor of the division operation by accumulating the dividend data; and determining output data of the activation function corresponding to an output of the division operation obtained by reading, from the memory, a value of a second lookup table addressed by the dividend data and the divisor data.

Vector and Matrix Computing Device
20180329868 · 2018-11-15 ·

A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.

METHOD AND APPARATUS WITH DATA PROCESSING

A processor-implemented data processing method includes: normalizing input data of an activation function comprising a division operation; determining dividend data corresponding to a dividend of the division operation by reading, from a memory, a value of a first lookup table addressed by the normalized input data; determining divisor data corresponding to a divisor of the division operation by accumulating the dividend data; and determining output data of the activation function corresponding to an output of the division operation obtained by reading, from the memory, a value of a second lookup table addressed by the dividend data and the divisor data.

Shared hardware integer/floating point divider and square root logic unit and associated methods
09983850 · 2018-05-29 · ·

Embodiments of the inventive concept include a shared hardware integer/floating point divider and square root logic unit, which combines floating point division, floating point square root operations, and/or integer division into one shared hardware design. The shared hardware logic unit can share, for example, a sparse random access memory (sparse RAM) in place of a full partial remainder divisor (PD) table, one or more on-the-fly (OTF) state machines, and/or a data path for integer division, floating point division, and square root operations. The normalization of subnormal numbers and the normalization of signed and unsigned integers can be handled with shared hardware. The division operations and the square root operations can be of the same radix. Early out exceptions and special cases can be automatically handled. Both improved latency and less die area can be achieved in accordance with embodiments of the inventive concept.