Patent classifications
G06F7/537
Small multiplier after initial approximation for operations with increasing precision
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
Small multiplier after initial approximation for operations with increasing precision
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
Small multiplier after initial approximation for operations with increasing precision
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
Calculation of a number of iterations
Performing an arithmetic operation in a data processing unit, including calculating a number of iterations for performing the arithmetic operation with a given number of bits per iteration. The number of bits per iteration is a positive natural number. A number of consecutive digit positions of a digit in a sequence of bits represented in the data processing unit is counted. The length of the sequence is a multiple of the number of bits per iteration. A quotient of the number of consecutive digit positions divided by the number of bits per iteration is calculated, as well as a remainder of the division.
Circuitry and method for performing division
A data processing apparatus comprises signal receiving circuitry to receive a signal corresponding to a divide instruction that identifies a dividend x and a divisor d. Processing circuitry performs, in response to said divide instruction, a radix-N division algorithm to generate a result value q=x/d, where N is an integer power of 2 and greater than 1. Said division algorithm comprises a plurality of iterations, each of said plurality of iterations being performed by quotient digit calculation circuitry to determine a quotient value of that iteration q[i+1] based on a remainder value of a previous iteration rem[i]; and remainder calculation circuitry to determine a remainder value of that iteration rem[i+1] based on said quotient value of that iteration q[i+1] and said remainder value of said previous iteration rem[i]. Result calculation circuitry derives said result value q based on each quotient value selected by said digit selection circuitry for each of said plurality of iterations. For at least some of said plurality of iterations, said quotient digit calculation circuitry speculatively determines a set of candidate values before a quotient value of said previous iteration is known and, in response to said quotient value of said previous iteration becoming known, determines said quotient value of that iteration q[i+1] based on one of said candidate values.
Circuitry and method for performing division
A data processing apparatus comprises signal receiving circuitry to receive a signal corresponding to a divide instruction that identifies a dividend x and a divisor d. Processing circuitry performs, in response to said divide instruction, a radix-N division algorithm to generate a result value q=x/d, where N is an integer power of 2 and greater than 1. Said division algorithm comprises a plurality of iterations, each of said plurality of iterations being performed by quotient digit calculation circuitry to determine a quotient value of that iteration q[i+1] based on a remainder value of a previous iteration rem[i]; and remainder calculation circuitry to determine a remainder value of that iteration rem[i+1] based on said quotient value of that iteration q[i+1] and said remainder value of said previous iteration rem[i]. Result calculation circuitry derives said result value q based on each quotient value selected by said digit selection circuitry for each of said plurality of iterations. For at least some of said plurality of iterations, said quotient digit calculation circuitry speculatively determines a set of candidate values before a quotient value of said previous iteration is known and, in response to said quotient value of said previous iteration becoming known, determines said quotient value of that iteration q[i+1] based on one of said candidate values.
SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
OPTIMIZED INTEGER DIVISION CIRCUIT
The disclosed embodiments relate to the design of an integer division circuit, which comprises: a dividend-input that receives an integer dividend A; a divisor-input that receives an integer divisor B; a quotient-output that outputs an integer quotient q; and a division engine that executes the Goldschmidt method to divide A by B to produce q. During a pre-processing operation, which commences executing before the Goldschmidt method starts executing, the division engine determines whether A<B. If A<B, the division engine sets q=0 without having to execute the Goldschmidt method.
Small multiplier after initial approximation for operations with increasing precision
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.