Patent classifications
G06F7/48
Oblivious carry runway registers for performing piecewise additions
Methods and apparatus for piecewise addition into an accumulation register using one or more carry runway registers, where the accumulation register includes a first plurality of qubits with each qubit representing a respective bit of a first binary number and where each carry runway register includes multiple qubits representing a respective binary number. In one aspect, a method includes inserting the one or more carry runway registers into the accumulation register at respective predetermined qubit positions, respectively, of the accumulation register; initializing each qubit of each carry runway register in a plus state; applying one or more subtraction operations to the accumulation register, where each subtraction operation subtracts a state of a respective carry runway register from a corresponding portion of the accumulation register; and adding one or more input binary numbers into the accumulation register using piecewise addition.
ON-THE-FLY CONVERSION
A data processing apparatus that converts a plurality of signed digits representing an input value in redundant representation comprises receiver circuitry to receive, at each of a plurality of iterations, a signed digit from the plurality of signed digits, and previous intermediate data from a previous iteration. Concatenation circuitry performs a concatenation of bits corresponding to the signed digit and bits of the previous intermediate data to produce updated intermediate data. Output circuitry provides the updated intermediate data as previous intermediate data of a next iteration. The previous intermediate data comprises S3[i] in non-redundant representation, which is at least part of the input value multiplied by 3 in non-redundant representation.
CASCADED COMPUTING FOR CONVOLUTIONAL NEURAL NETWORKS
Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.
Cascaded computing for convolutional neural networks
Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.
Cascaded computing for convolutional neural networks
Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.
FLOATING POINT FUSED MULTIPLY ADD WITH REDUCED 1'S COMPLEMENT DELAY
A method includes receiving a carry-sum value corresponding to a first portion of inputs to an adder, and receiving a second value corresponding to a second portion of inputs to the adder that do not overlap the first portion. Method includes providing an intermediate sum of carry and sum values of the carry-sum value, which generates a carry out (Cout). Method includes determining a sign of incremented second value, and a sign of non-incremented second value; complementing or passing, responsive to sign of incremented result, the incremented result as a first output; complementing or passing, responsive to sign of non-incremented result, the non-incremented result as a second output; complementing or passing, responsive to Cout, sign of incremented result, and sign of non-incremented result, the intermediate sum as a third output; selecting one of the first, second outputs responsive to Cout; and providing final sum comprising third output and selected output.
Method and apparatus for configuring a reduced instruction set computer processor architecture to execute a fully homomorphic encryption algorithm
Systems and methods for configuring a reduced instruction set computer processor architecture to execute fully homomorphic encryption (FHE) logic gates as a streaming topology. The method includes parsing sequential FHE logic gate code, transforming the FHE logic gate code into a set of code modules that each have in input and an output that is a function of the input and which do not pass control to other functions, creating a node wrapper around each code module, configuring at least one of the primary processing cores to implement the logic element equivalents of each element in a manner which operates in a streaming mode wherein data streams out of corresponding arithmetic logic units into the main memory and other ones of the plurality arithmetic logic units.
CORDIC COMPUTATION OF SIN/COS USING COMBINED APPROACH IN ASSOCIATIVE MEMORY
A method for an associative memory device includes the steps of providing a look up table (LUT) with all possible solutions for N first iterations of a CORDIC algorithm, receiving a plurality of input angles, concurrently computing a location index for each angle of the plurality of angles and concurrently storing each index in a column of the associative memory device, copying a solution from the LUT in the location index to a plurality of columns associated with the index and concurrently performing M additional iterations of the CORDIC algorithm on the columns to compute a value of a trigonometric function for each angle.
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
An information processing system according to an embodiment is configured to: acquire numerical representations and combination ratios for a plurality of component objects; acquire numerical representations for a plurality of reference objects; calculate a plurality of component feature vectors and a plurality of reference feature vectors by inputting the numerical representations of each of the plurality of component objects and the plurality of reference objects into a first machine learning model; calculate a probability vector for each of the plurality of component objects by inputting those feature vectors into a second machine learning model; and calculate a composite feature vector for a composite object obtained by combining the plurality of component objects, based on a plurality of probability vectors and a plurality of combination ratios.
ACCELERATION OF ELLIPTIC CURVE-BASED ISOGENY CRYPTOSYSTEMS
Provided are embodiments for a circuit comprising for performing hardware acceleration for elliptic curve cryptography (ECC). The circuit includes a code array comprising instructions for performing complex modular arithmetic; and a data array storing values corresponding to one or more complex numbers. The modular arithmetic unit includes a first multiplier and a first accumulation unit, a second multiplier and a second accumulation unit, and a third multiplier and a third accumulation unit, wherein the first, second, and third multiplier and accumulation units are cascaded and configured to perform hardware computation of complex modular operations. Also provided are embodiments of a computer program product and a method for performing the hardware acceleration of super-singular isogeny key encryption (SIKE) operations.