Patent classifications
G06F15/7867
DISTRIBUTED ACCELERATOR
Systems, methods, and devices are described coordinating a distributed accelerator. A command that includes instructions for performing a task is received. One or more sub-tasks of the task are determined to generate a set of sub-tasks. For each sub-task of the set of sub-tasks, an accelerator slice of a plurality of accelerator slices of a distributed accelerator is allocated, sub-task instructions for performing the sub-task are determined. Sub-task instructions are transmitted to the allocated accelerator slice for each sub-task. Each allocated accelerator slice is configured to generate a corresponding response indicative of the allocated accelerator slice having completed a respective sub-task. In a further example aspect, corresponding responses are received from each allocated accelerator slice and a coordinated response indicative of the corresponding responses is generated.
METHOD OF NOTIFYING A PROCESS OR PROGRAMMABLE ATOMIC OPERATION TRAPS
Disclosed in some examples, are methods, systems, programmable atomic units, and machine-readable mediums that provide an exception as a response to the calling processor. That is, the programmable atomic unit will send a response to the calling processor. The calling processor will recognize that the exception has been raised and will handle the exception. Because the calling processor knows which process triggered the exception, the calling processor (e.g., the Operating System) can take appropriate action, such as terminating the calling process. The calling processor may be a same processor as that executing the programmable atomic transaction, or a different processor (e.g., on a different chiplet).
Control registers to store thread identifiers for threaded loop execution in a self-scheduling reconfigurable computing fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.
Anti-congestion flow control for reconfigurable processors
A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
ZERO-COPY SPARSE MATRIX FACTORIZATION SYNTHESIS FOR HETEROGENEOUS COMPUTE SYSTEMS
A system, method, and computer-readable medium for synthesizing zero-copy sparse matrix factorization operations in heterogeneous compute systems are provided. The system includes a host and an accelerator device. The host device is configured to divide an input matrix into a plurality of blocks which are transferred to a memory of the accelerator device. The host device is also configured to generate at least one index buffer that includes pointers to the block in the accelerator's memory, where each index buffer represents a frontal matrix associated with a matrix decomposition algorithm. The host processor is configured to receive one or more kernels configured to process the index buffer(s) on an accelerator device. The index buffers are processed by the accelerator device and the modified block data is written back to a memory of the host device to generate a factorized output matrix.
Storage device including reconfigurable logic and method of operating the storage device
A storage device includes a reconfigurable logic circuit, a control logic circuit, and non-volatile memory. The reconfigurable logic circuit is changeable from a first accelerator to a second accelerator during an operation of the storage device. The control logic circuit is configured to receive, from the host, a host command including information about a function required by the host and dynamically reconfigure the reconfigurable logic circuit such that the reconfigurable logic circuit performs the function according to the received host command. The non-volatile memory is connected to the control logic circuit.
Discrete three-dimensional processor
A discrete three-dimensional (3-D) processor comprises first and second dice. The first die comprises 3-D memory (3D-M) arrays, whereas the second die comprises logic circuits and at least an off-die peripheral-circuit component of the 3D-M array(s). Typical off-die peripheral-circuit component could be an address decoder, a sense amplifier, a programming circuit, a read-voltage generator, a write-voltage generator, a data buffer, or a portion thereof.
Non-volatile memory based processors and dataflow techniques
A monolithic integrated circuit (IC) including one or more compute circuitry, one or more non-volatile memory circuits, one or more communication channels and one or more communication interface. The one or more communication channels can communicatively couple the one or more compute circuitry, the one or more non-volatile memory circuits and the one or more communication interface together. The one or more communication interfaces can communicatively couple one or more circuits of the monolithic integrated circuit to one or more circuits external to the monolithic integrated circuit.
Reconfigurable circuit array using instructions including a fetch configuration data portion and a transfer configuration data portion
A method and system are provided for configurable computation and data processing. A logical processor includes an array of logic elements. The processor may be a combinatorial circuit that can be applied to modify computational aspects of an array of reconfigurable circuits. A memory stores a plurality of instructions, each instruction including an instruction-fetch data portion and an output data transfer data portion. One or more memory controllers are coupled to the memory and receive instructions and/or output data from the memory. A back buffer is coupled with the memory controller and receives instructions from the memory controller. The back buffer sequentially asserts each received instruction upon one or more memory controllers. The memory controllers transfer data received from the memory to a target, such as an array of reconfigurable logic circuits that are optionally coupled to the memory, the back buffer, and one or more additional memory controllers.
Logic repository service supporting adaptable host logic
The following description is directed to a logic repository service supporting adaptable host logic. In one example, a method of a logic repository service can include receiving a first request to generate configuration data for configurable hardware using a specification for application logic. The method can include selecting a particular host logic shell from a group of host logic shells. The particular host logic shell can be used to encapsulate the application logic when the configurable hardware is configured. Configuration data for the configurable hardware can be generated. The configuration data can include data for implementing the application logic and at least a portion of the particular host logic shell. The method can include receiving a second request to download the configuration data to a host server computer comprising the configurable hardware. The configuration data can be transmitted to the host server computer in response to the second request.