G06F7/498

CONNECTIVITY IN COARSE GRAINED RECONFIGURABLE ARCHITECTURE
20230053439 · 2023-02-23 ·

A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, the tiles can be arranged in a one-dimensional array and each tile can be coupled to its respective adjacent neighbor tiles using a direct bus coupling. Each tile can be further coupled to at least one non-adjacent neighbor tile that is one tile, or device space, away using a passthrough bus. The passthrough bus can extend through intervening tiles.

MULTIPLICATION
20230111089 · 2023-04-13 · ·

A device includes a memory, which, in operation, stores one or more look-up tables, and cryptographic circuitry coupled to the memory. The cryptographic circuitry, in operation, multiplies first data masked with a first mask by second data masked with a second mask, and protects the first data and the second data during the multiplying. The multiplying and protecting includes remasking the first data with a third mask, remasking the second data with a fourth mask, executing one or more compensation operations using one or more of the one or more look-up tables, and generating third data masked with a fifth mask. The fifth mask is independent of the first, second, third, and fourth masks. The third data corresponds to the first data multiplied by the second data.

Data computing system

The present disclosure provides a data computing system. The data computing system comprises: a memory, a processor and an accelerator, wherein the memory is communicatively coupled to the processor and configured to store data to be computed and a computed result, the data being written by the processor; the processor is communicatively coupled to the accelerator and configured to control the accelerator; and the accelerator is communicatively coupled to the memory and configured to access the memory according to pre-configured control information, implement a computing process to produce the computed result and write the computed result back to the memory. The present disclosure also provides an accelerator and a method performed by an accelerator of a data computing system. The present disclosure can improve the execution efficiency of the processor and reduce the computing overhead of the processor.

iNecklace & iAlphabet & iUniverse
20170220326 · 2017-08-03 ·

A method for iNecklace-iAlphabet-iUniverse comprises receiving a request for intuitive structures from the environment, constructing the iAlphabet, computing the identity, performing algebraic, categorical, and homotopy type constructions, performing constructions with measures, recalling relevant instances with reconstructions, identifying the analytic device, enabling composability to construct iUniverse.

Configurable Processor with In-Package Look-Up Table
20170322771 · 2017-11-09 · ·

The present invention discloses a configurable processor with an in-package look-up table. The configurable processor comprises a programmable memory die and a logic die located in a same package. The programmable memory die comprises a look-up table circuit (LUT) for storing data related to a desired function. The logic die comprises an arithmetic logic circuit (ALC) for performing arithmetic operations on the data read out from the LUT.

Selectively dispatching waves based on accumulators holding behavioral characteristics of waves currently executing

An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.

Selectively dispatching waves based on accumulators holding behavioral characteristics of waves currently executing

An apparatus such as a graphics processing unit (GPU) includes a plurality of processing elements configured to concurrently execute a plurality of first waves and accumulators associated with the plurality of processing elements. The accumulators are configured to store accumulated values representative of behavioral characteristics of the plurality of first waves that are concurrently executing on the plurality of processing elements. The apparatus also includes a dispatcher configured to dispatch second waves to the plurality of processing elements based on comparisons of values representative of behavioral characteristics of the second waves and the accumulated values stored in the accumulators. In some cases, the behavioral characteristics of the plurality of first waves comprise at least one of fetch bandwidths, usage of an arithmetic logic unit (ALU), and number of export operations.

Compute in/near memory (CIM) circuit architecture for unified matrix-matrix and matrix-vector computations

A memory circuit includes a number (X) of multiply-accumulate (MAC) circuits that are dynamically configurable. The MAC circuits can either compute an output based on computations of X elements of the input vector with the weight vector, or to compute the output based on computations of a single element of the input vector with the weight vector, with each element having a one bit or multibit length. A first memory can hold the input vector having a width of X elements and a second memory can store the weight vector. The MAC circuits include a MAC array on chip with the first memory.

DATA COMPUTING SYSTEM
20220147357 · 2022-05-12 ·

The present disclosure provides a data computing system. The data computing system comprises: a memory, a processor and an accelerator, wherein the memory is communicatively coupled to the processor and configured to store data to be computed and a computed result, the data being written by the processor; the processor is communicatively coupled to the accelerator and configured to control the accelerator; and the accelerator is communicatively coupled to the memory and configured to access the memory according to pre-configured control information, implement a computing process to produce the computed result and write the computed result back to the memory. The present disclosure also provides an accelerator and a method performed by an accelerator of a data computing system. The present disclosure can improve the execution efficiency of the processor and reduce the computing overhead of the processor.

COMPUTING APPARATUS, METHOD, BOARD CARD AND COMPUTER-READABLE STORAGE MEDIUM
20220253280 · 2022-08-11 ·

The present disclosure provides a computing device for processing a multi-bit width value, an integrated circuit board card, a method, and a computer readable storage medium. The computing device is included in the combined processing apparatus, and the combined processing apparatus further includes a general interconnection interface, and other processing devices. The computing device interacts with the other processing device to jointly complete a computing operation specified by a user. The combined processing apparatus further includes a storage device connected to an apparatus and the other processing devices and configured to store data of the apparatus and the other processing device. The solution of the present disclosure can split the multi-bit width value so that the processing capability of the processor is not influenced by the bit width.