Patent classifications
G06F7/552
Small multiplier after initial approximation for operations with increasing precision
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
Small multiplier after initial approximation for operations with increasing precision
In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.
SECURE SQUARE ROOT COMPUTATION SYSTEM, SECURE NORMALIZATION SYSTEM, METHODS THEREFOR, SECURE COMPUTATION APPARATUS, AND PROGRAM
A flag sequence generator (12) generates {x.sub.0}, . . . , {x.sub.λ−11} indicating a msb of a. A bit sequence generator (13) calculates {y.sub.i}:={x.sub.2i} XOR {x.sub.2i+1} to generate {y.sub.0}, . . . , {y.sub.λ′−1}. A flag calculator (14) calculates an exclusive logical sum of all {x.sub.j} to calculate [r] for each odd j. A public value multiplier setting-unit (16) sets r′ that becomes √2 when λ is an odd and 1 when λ is an even. A normalization multiplier generator (17) bit-connects {y.sub.0}, . . . to generate [c′]. A normalization multiplier generator (18) bit-connects {x.sub.λ−1}, . . . to generate [c]. A normalizer (19) calculates [b]:=[a][c]. A square root calculator (20) calculates [w]:=[√b]*(r′/√2) when r=1, and [w′]:=[√b]*r′ when r=0. An inverse normalizer (21) calculates [w][c′] and performs λ′ bits right-shift.
SECURE SQUARE ROOT COMPUTATION SYSTEM, SECURE NORMALIZATION SYSTEM, METHODS THEREFOR, SECURE COMPUTATION APPARATUS, AND PROGRAM
A flag sequence generator (12) generates {x.sub.0}, . . . , {x.sub.λ−11} indicating a msb of a. A bit sequence generator (13) calculates {y.sub.i}:={x.sub.2i} XOR {x.sub.2i+1} to generate {y.sub.0}, . . . , {y.sub.λ′−1}. A flag calculator (14) calculates an exclusive logical sum of all {x.sub.j} to calculate [r] for each odd j. A public value multiplier setting-unit (16) sets r′ that becomes √2 when λ is an odd and 1 when λ is an even. A normalization multiplier generator (17) bit-connects {y.sub.0}, . . . to generate [c′]. A normalization multiplier generator (18) bit-connects {x.sub.λ−1}, . . . to generate [c]. A normalizer (19) calculates [b]:=[a][c]. A square root calculator (20) calculates [w]:=[√b]*(r′/√2) when r=1, and [w′]:=[√b]*r′ when r=0. An inverse normalizer (21) calculates [w][c′] and performs λ′ bits right-shift.
Use of a single instruction set architecture (ISA) instruction for vector normalization
Embodiments described herein are generally directed to an improved vector normalization instruction. An embodiment of a method includes responsive to receipt by a GPU of a single instruction specifying a vector normalization operation to be performed on V vectors: (i) generating V squared length values, N at a time, by a first processing unit, by, for each N sets of inputs, each representing multiple component vectors for N of the vectors, performing N parallel dot product operations on the N sets of inputs. Generating V sets of outputs representing multiple normalized component vectors of the V vectors, N at a time, by a second processing unit, by, for each N squared length values of the V squared length values, performing N parallel operations on the N squared length values, wherein each of the N parallel operations implement a combination of a reciprocal square root function and a vector scaling function.
COMBINED DIVIDE/SQUARE ROOT PROCESSING CIRCUITRY AND METHOD
An apparatus comprises combined divide/square root processing circuitry to perform, in response to a divide instruction, a given radix-64 iteration of a radix-64 divide operation, and in response to a square root instruction, a given radix-64 iteration of a radix-64 square root operation; in which: the combined divide/square root processing circuitry comprises shared circuitry to generate at least one output value for the given radix-64 iteration on a same data path used for both the radix-64 divide operation and the radix-64 square root operation.
ON-THE-FLY CONVERSION
A data processing apparatus that converts a plurality of signed digits representing an input value in redundant representation comprises receiver circuitry to receive, at each of a plurality of iterations, a signed digit from the plurality of signed digits, and previous intermediate data from a previous iteration. Concatenation circuitry performs a concatenation of bits corresponding to the signed digit and bits of the previous intermediate data to produce updated intermediate data. Output circuitry provides the updated intermediate data as previous intermediate data of a next iteration. The previous intermediate data comprises S3[i] in non-redundant representation, which is at least part of the input value multiplied by 3 in non-redundant representation.
SQUARE ROOT PROCESSING CIRCUITRY AND METHOD
Square root processing circuitry performs a given radix-r iteration of a radix-r square root operation, by performing multiple radix-n sub-iterations in a same processing cycle, where n<r. The square root processing circuitry comprises, for a given radix-n sub-iteration: digit selection circuitry to select, based on a previous remainder estimate, a next radix-n result digit for a square root result; remainder update circuitry to adjust a previous remainder value to generate an updated remainder value; and remainder estimate circuitry to generate an updated remainder estimate indicative of an estimate of a portion of the updated remainder value. In a final radix-n sub-iteration of the given radix-r iteration, the remainder estimate circuitry generates the updated remainder estimate in parallel with the remainder update circuitry generating the updated remainder value.
SQUARE ROOT PROCESSING CIRCUITRY AND METHOD
Square root processing circuitry performs a given radix-r iteration of a radix-r square root operation, by performing multiple radix-n sub-iterations in a same processing cycle, where n<r. The square root processing circuitry comprises, for a given radix-n sub-iteration: digit selection circuitry to select, based on a previous remainder estimate, a next radix-n result digit for a square root result; remainder update circuitry to adjust a previous remainder value to generate an updated remainder value; and remainder estimate circuitry to generate an updated remainder estimate indicative of an estimate of a portion of the updated remainder value. In a final radix-n sub-iteration of the given radix-r iteration, the remainder estimate circuitry generates the updated remainder estimate in parallel with the remainder update circuitry generating the updated remainder value.
SQUARE ROOT CALCULATIONS ON AN ASSOCIATIVE PROCESSING UNIT
A method for calculating a square root B having N bits of a number X having 2N bits includes iterating on bits b.sub.i of square root B starting from the most significant bit until the least significant bit of square root B. For each iteration, the method includes locating a 1 at the squared location of bit b.sub.i in a CHECK variable, determining the value of bit b.sub.i from the result of a comparison of number X with a function of all previously found bits and a previous comparison outcome, shifting all previously found bits right 1 location in a CHECK variable, and adding the determined value of bit b.sub.i into its squared location in the CHECK variable.