G06F7/5324

Device and method for accelerating matrix multiply operations

A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

MULTI-PARTITIONING DATA FOR COMBINATION OPERATIONS
20230144450 · 2023-05-11 ·

Systems and methods are disclosed for processing and executing queries against one or more dataset. As part of processing the query, the system determines whether the query is susceptible to a significantly imbalanced partition. In the event, the query is susceptible to an imbalanced partition, the system monitors the query and determines whether to perform a multi-partitioning determination to avoid a significantly imbalanced partition.

DEVICE AND METHOD FOR MULTIPLICATION FOR IMPEDING SIDE-CHANNEL ATTACKS

A device for multiplying two bit sequences has a controller that selects and activates exactly one multiplier unit from a plurality of parallel multiplier units, according to a random signal. A partial multiplier unit shared by all the multiplier units receives and multiplies operands formed by the respectively activated multiplier unit. Each multiplier unit implements a different multiplication method with a respective selector unit that selects segments of the bit sequences to be multiplied, in accordance with a selection plan adapted to the respective multiplication method, to form operands from one or more segments and outputs the operands. The respective accumulation unit receives step by step partial products from the partial multiplier unit, accumulates the partial products in accordance with an accumulation plan adapted to the implemented multiplication method and matching the selection plan, and outputs the calculated product of after accumulation has been completed.

Multiple mode arithmetic circuit

A tile of an FPGA includes a multiple mode arithmetic circuit. The multiple mode arithmetic circuit is configured by control signals to operate in an integer mode, a floating-point mode, or both. In some example embodiments, multiple integer modes (e.g., unsigned, two's complement, and sign-magnitude) are selectable, multiple floating-point modes (e.g., 16-bit mantissa and 8-bit sign, 8-bit mantissa and 6-bit sign, and 6-bit mantissa and 6-bit sign) are supported, or any suitable combination thereof. The tile may also fuse a memory circuit with the arithmetic circuits. Connections directly between multiple instances of the tile are also available, allowing multiple tiles to be treated as larger memories or arithmetic circuits. By using these connections, referred to as cascade inputs and outputs, the input and output bandwidth of the arithmetic circuit is further increased.

AREA AND ENERGY EFFICIENT MULTI-PRECISION MULTIPLY-ACCUMULATE UNIT-BASED PROCESSOR

Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.

Device For Performing Multiply/Accumulate Operations
20210382689 · 2021-12-09 ·

A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.

METHOD FOR MULTIPLYING POLYNOMIALS FOR A CRYPTOGRAPHIC OPERATION

A method is provided for multiplying two polynomials. In the method, first and second polynomials are evaluated at 2t inputs, where t is greater than or equal to one, and where each input is a fixed power of two custom-character multiplied with a different power of a primitive root of unity, thereby creating 2 times 2t integers, where custom-character is an integer such that custom-character is at least as large as the largest coefficient of the resulting product multiplying the first and second polynomials. The 2 times 2t integers are then multiplied pairwise, and a modular reduction is performed to get 2t integers. A linear combination of the 2t integers multiplied with primitive roots of unity is computed to get 2t integers whose limbs in the base custom-character-bit representation correspond to coefficients of the product of the first and second polynomials. The method can be implemented on a processor designed for performing RSA and/or ECC type cryptographic operations.

COMPUTING APPARATUS AND METHOD, BOARD CARD, AND COMPUTER READABLE STORAGE MEDIUM
20220188071 · 2022-06-16 ·

The present disclosure relates to a computing device for processing a multi-bit width value, an integrated circuit board card, a method, and a computer readable storage medium. The computing device may be included in a combined processing apparatus, and the combined processing apparatus may further include a general interconnection interface, and an other processing device. The computing device interacts with the other processing device to jointly complete a computing operation specified by a user. The combined processing apparatus may further include a storage device connected to an apparatus and the other processing device and configured to store data of the apparatus and the other processing device. The solution of the present disclosure can split the multi-bit width value so that the processing capability of the processor is not influenced by the bit width.

SYSTEMS AND METHODS FOR CALCULATING LARGE POLYNOMIAL MULTIPLICATIONS
20220188072 · 2022-06-16 ·

This disclosure is directed to multiplier circuitry that includes a multiplier that is configurable to generate a plurality of subproducts by performing a plurality of multiplication operations involving values having a first precision using a recursive multiplication process in which a second multiplier of the multiplier performs a second plurality of multiplication operations involving values having a second precision that are derived from the values having the first precision.

COMPUTATIONAL MEMORY
20220171829 · 2022-06-02 ·

A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.