Patent classifications
G06F7/4981
BIGNUM ADDITION AND/OR SUBTRACTION WITH CARRY PROPAGATION
A processing unit includes a plurality of adders and a plurality of carry bit generation circuits. The plurality of adders add first and second X bit binary portion values of a first Y bit binary value and a second Y bit binary value. Y is a multiple of X. The plurality of adders further generate first carry bits. The plurality of carry bit generation circuits is coupled to the plurality of adders, respectively, and receive the first carry bits. The plurality of carry bit generation circuits generate second carry bits based on the first carry bits. The plurality of adders use the second carry bits to add the first and second X bit binary portions of the first and second Y bit binary values, respectively.
SCATTER REDUCTION INSTRUCTION
Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.
Apparatus and Methods of Providing an Efficient Radix-R Fast Fourier Transform
In some embodiments, an apparatus can include a memory configured to store data at a plurality of addresses and a generalized radix-r fast Fourier transform (FFT) processor configured to determine a plurality of FFTs for any positive integer Discrete Fourier Transform (DFT) by utilizing three counters to access the data and the coefficient multipliers at each stage of the FFT processor.
SCATTER REDUCTION INSTRUCTION
Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.
Connectivity in coarse grained reconfigurable architecture
A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, the tiles can be arranged in a one-dimensional array and each tile can be coupled to its respective adjacent neighbor tiles using a direct bus coupling. Each tile can be further coupled to at least one non-adjacent neighbor tile that is one tile, or device space, away using a passthrough bus. The passthrough bus can extend through intervening tiles.
CONNECTIVITY IN COARSE GRAINED RECONFIGURABLE ARCHITECTURE
A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, the tiles can be arranged in a one-dimensional array and each tile can be coupled to its respective adjacent neighbor tiles using a direct bus coupling. Each tile can be further coupled to at least one non-adjacent neighbor tile that is one tile, or device space, away using a passthrough bus. The passthrough bus can extend through intervening tiles.
NON-LINEAR FUNCTION COMPUTING APPARATUS AND NON-LINEAR FUNCTION COMPUTING METHOD
The present embodiment relates to a computing apparatus for computing an interpolated non-linear activation function for an input. The computing apparatus includes a plurality of unit processing elements (PEs), and each unit PE includes: a multiplier that multiplies the input and an output of an accumulator, an adder that adds the output of the multiplier and a coefficient of the interpolated non-linear activation function; and an accumulator that accumulates and outputs the output of the adder.
Computing device and method for processing multi-bit width data
The present disclosure provides a computing device for processing a multi-bit width value, an integrated circuit board card, a method, and a computer readable storage medium. The computing device is included in the combined processing apparatus, and the combined processing apparatus further includes a general interconnection interface, and other processing devices. The computing device interacts with the other processing device to jointly complete a computing operation specified by a user. The combined processing apparatus further includes a storage device connected to an apparatus and the other processing devices and configured to store data of the apparatus and the other processing device. The solution of the present disclosure can split the multi-bit width value so that the processing capability of the processor is not influenced by the bit width.