G06F7/552

PROCESSING UNIT, METHOD AND COMPUTER PROGRAM FOR MULTIPLYING AT LEAST TWO MULTIPLICANDS
20210224037 · 2021-07-22 ·

A processing unit and a method for multiplying at least two multiplicands. The multiplicands are present in an exponential notation, that is, each multiplicand is assigned an exponent and a base. The processing unit is configured to carry out a multiplication of the multiplicands and includes at least one bitshift unit, the bitshift unit shifting a binary number a specified number of places, in particular, to the left; an arithmetic unit, which carries out an addition of two input variables and a subtraction of two input variables; and a storage device. A computer program, which is configured to execute the method, and a machine-readable storage element, in which the computer program is stored, are also described.

PROCESSING UNIT, METHOD AND COMPUTER PROGRAM FOR MULTIPLYING AT LEAST TWO MULTIPLICANDS
20210224037 · 2021-07-22 ·

A processing unit and a method for multiplying at least two multiplicands. The multiplicands are present in an exponential notation, that is, each multiplicand is assigned an exponent and a base. The processing unit is configured to carry out a multiplication of the multiplicands and includes at least one bitshift unit, the bitshift unit shifting a binary number a specified number of places, in particular, to the left; an arithmetic unit, which carries out an addition of two input variables and a subtraction of two input variables; and a storage device. A computer program, which is configured to execute the method, and a machine-readable storage element, in which the computer program is stored, are also described.

ARITHMETIC PROCESSING APPARATUS AND CONTROL METHOD FOR ARITHMETIC PROCESSING APPARATUS
20210224040 · 2021-07-22 · ·

An arithmetic processing apparatus computes a square root of a radicand and includes: a memory; and a processor coupled to the memory and configured to: determine a part of a bit string of a quotient; calculate a first partial remainder based on the bit string and a partial remainder by performing a first operation other than an exponentiation operation in a partial remainder operation; and calculate the partial remainder by performing a second operation that includes the exponentiation operation, using the first partial remainder and the bit string.

ARITHMETIC PROCESSING APPARATUS AND CONTROL METHOD FOR ARITHMETIC PROCESSING APPARATUS
20210224040 · 2021-07-22 · ·

An arithmetic processing apparatus computes a square root of a radicand and includes: a memory; and a processor coupled to the memory and configured to: determine a part of a bit string of a quotient; calculate a first partial remainder based on the bit string and a partial remainder by performing a first operation other than an exponentiation operation in a partial remainder operation; and calculate the partial remainder by performing a second operation that includes the exponentiation operation, using the first partial remainder and the bit string.

Computational Units for Batch Normalization

Herein are disclosed computation units for batch normalization. A computation unit may include a first circuit to traverse a batch of input elements x.sub.i having a first format, to produce a mean μ.sub.1 in the first format and a mean μ.sub.2 in a second format, the second format having more bits than the first format. The computation unit may further include a second circuit operatively coupled to the first circuit to traverse the batch of input elements x.sub.i to produce a standard deviation σ for the batch using the mean μ.sub.1 in the first format. The computation unit may also include a third circuit operatively coupled to the second circuit to traverse the batch of input elements x.sub.i to produce a normalized set of values y.sub.i using the mean μ.sub.2 in the second format and the standard deviation σ.

USE OF A SINGLE INSTRUCTION SET ARCHITECTURE (ISA) INSTRUCTION FOR VECTOR NORMALIZATION

Embodiments described herein are generally directed to an improved vector normalization instruction. An embodiment of a method includes responsive to receipt by a GPU of a single instruction specifying a vector normalization operation to be performed on V vectors: (i) generating V squared length values, N at a time, by a first processing unit, by, for each N sets of inputs, each representing multiple component vectors for N of the vectors, performing N parallel dot product operations on the N sets of inputs. Generating V sets of outputs representing multiple normalized component vectors of the V vectors, N at a time, by a second processing unit, by, for each N squared length values of the V squared length values, performing N parallel operations on the N squared length values, wherein each of the N parallel operations implement a combination of a reciprocal square root function and a vector scaling function.

Small multiplier after initial approximation for operations with increasing precision
10983756 · 2021-04-20 · ·

In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

Small multiplier after initial approximation for operations with increasing precision
10983756 · 2021-04-20 · ·

In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

Execution unit for evaluating functions using Newton Raphson iterations
11847428 · 2023-12-19 · ·

An execution unit for a processor, the execution unit comprising: a look up table having a plurality of entries, each of the plurality of entries comprising an initial estimate for a result of an operation; a preparatory circuit configured to search the look up table using an index value dependent upon the operand to locate an entry comprising a first initial estimate for a result of the operation; a plurality of processing circuits comprising at least one multiplier circuit; and control circuitry configured to provide the first initial estimate to the at least one multiplier circuit of the plurality of processing circuits so as perform processing, by the plurality of processing units, of the first initial estimate to generate the function result, said processing comprising applying one or more Newton Raphson iterations to the first initial estimate.

Secure web browsing via homomorphic encryption
10972251 · 2021-04-06 · ·

Systems and methods for end-to-end encryption of a web browsing process are described herein. A web query is encrypted at a client using a homomorphic encryption scheme. The encrypted query is sent to a server where the encrypted query is evaluated over web content to generate an encrypted response without decrypting the encrypted query and without decrypting the response. The encrypted response is sent to the client where it is decrypted to obtain the results of the query without revealing the query or results to the owner of the web content, an observer, or an attacker.