G06F7/575

CONSTANT TIME SECURE ARITHMETIC-TO-BOOLEAN MASK CONVERSION
20210406406 · 2021-12-30 ·

A first arithmetic input share and a second arithmetic input share of an initial arithmetically-masked cryptographic value are received. A sequence of operations using the arithmetic input shares and a randomly generated number is performed, where a current operation in the sequence of operations generates a corresponding intermediate value that is used in a subsequent operation. At the end of the sequence of operations, a first Boolean output share and a second Boolean output share are generated. The arithmetic-to-Boolean mask conversion is independent of the input bit length.

ADDER CIRCUIT USING LOOKUP TABLES
20220206758 · 2022-06-30 ·

A four-input lookup table (“LUT4”) is modified to operate in a first mode as an ordinary LUT4 and in a second mode as a 1-bit adder providing a sum output and a carry output. A six-input lookup table (“LUT6”) is modified to operate in a first mode as an ordinary LUT6 with a single output and in a second mode as a 2-bit adder providing a sum output and a carry output. Both possible results for the two different possible carry inputs can be determined and selected between when the carry input is available, implementing a 2-bit carry-select adder when in the second mode and retaining the ability to operate as an ordinary LUT6 in the first mode. Using the novel LUT6 design in a circuit chip fabric allows a 2-bit adder slice to be built that efficiently makes use of the LUT6 without requiring additional logic blocks.

ADDER CIRCUIT USING LOOKUP TABLES
20220206758 · 2022-06-30 ·

A four-input lookup table (“LUT4”) is modified to operate in a first mode as an ordinary LUT4 and in a second mode as a 1-bit adder providing a sum output and a carry output. A six-input lookup table (“LUT6”) is modified to operate in a first mode as an ordinary LUT6 with a single output and in a second mode as a 2-bit adder providing a sum output and a carry output. Both possible results for the two different possible carry inputs can be determined and selected between when the carry input is available, implementing a 2-bit carry-select adder when in the second mode and retaining the ability to operate as an ordinary LUT6 in the first mode. Using the novel LUT6 design in a circuit chip fabric allows a 2-bit adder slice to be built that efficiently makes use of the LUT6 without requiring additional logic blocks.

Scheduling tasks using targeted pipelines

A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.

Scheduling tasks using targeted pipelines

A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.

MULTI-DIMENSION DMA CONTROLLER AND COMPUTER SYSTEM INCLUDING THE SAME

Disclosed is a multi-dimension DMA controller for performing a direct memory access (DMA) of multi-dimension data stored in a memory, according to the present disclosure, which includes a descriptor including a microcode descriptor, a normal descriptor, and a three-dimensional blob descriptor for accessing the multi-dimension data, a microcode controller that executes an instruction included in the microcode descriptor, and a transmission controller that automatically transmits at least a portion of the multi-dimension data depending on a parameter stored in the descriptors.

MULTI-DIMENSION DMA CONTROLLER AND COMPUTER SYSTEM INCLUDING THE SAME

Disclosed is a multi-dimension DMA controller for performing a direct memory access (DMA) of multi-dimension data stored in a memory, according to the present disclosure, which includes a descriptor including a microcode descriptor, a normal descriptor, and a three-dimensional blob descriptor for accessing the multi-dimension data, a microcode controller that executes an instruction included in the microcode descriptor, and a transmission controller that automatically transmits at least a portion of the multi-dimension data depending on a parameter stored in the descriptors.

COMPUTATIONAL MEMORY
20220171829 · 2022-06-02 ·

A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.

COMPUTATIONAL MEMORY
20220171829 · 2022-06-02 ·

A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.

BIT STRING ACCUMULATION
20220156046 · 2022-05-19 ·

Systems, apparatuses, and methods related to bit string accumulation are described. A method for bit string accumulation can include performing an iteration of a recursive operation using a first bit string and a second bit string and modifying a quantity of bits of a result of the iteration of the recursive operation, wherein the modified quantity of bits is less than a threshold quantity of bits. The method can further include writing a first value comprising the modified bits indicative of the result of the iteration of the recursive operation to a first register and writing a second value indicative of the factor corresponding to the result of the iteration of the recursive operation to a second register.