G06F15/825

SCHEDULING TASKS FOR EXECUTION BY A PROCESSOR SYSTEM

A computer-implemented method schedules a plurality of tasks for execution by a processor system. A first execution model for the plurality of tasks is accessed. Data is generated that identifies which tasks in the execution model are not direct-feedthrough tasks. The data is used to determine an order for executing the tasks at least partly in dependence on whether or not each task is a direct-feedthrough task.

DEPENDENCY-AWARE SERVER PROCESSING OF DATAFLOW APPLICATIONS
20230205580 · 2023-06-29 · ·

A computer implemented method comprises a server processing work requests of a work requester. The work requester can communicate to the server a processing dependency of one work request on a second work request. The server can associate the dependency with the work requests and/or a queue of work requests. The dependency include a condition to be met in association with processing the work requests, and the condition can include an action for the server to take in association with processing a work request. A computing system can comprise a work requester, a server, and a set of dependency-aware queues for processing a set of work requests. A queue and/or work requests on the queues can be associated with a processing dependency and the server can process work requests enqueued to the queues in an order based on the dependencies. A work requester/server interface can comprise a dependency framework.

PROGRAM COUNTER ALIGNMENT ACROSS A RECONFIGURABLE HUM FABRIC
20170364473 · 2017-12-21 ·

Techniques are disclosed for circuit synchronization. Information is obtained on logical distances between circuits on a semiconductor chip. A plurality of clusters is determined within the chip circuits, where a cluster within the plurality of clusters is synchronized to a tic cycle boundary. A tic cycle count separation is evaluated across the clusters using the information on the logical distances. A plurality of counter initializations is calculated where the counter initializations compensate for the tic cycle count separation across the clusters. A plurality of counters is initialized, with a counter from the plurality of counters being associated with each cluster from the plurality of clusters, where the counters are distributed across the clusters, and where the initializing is based on the counter initializations that were calculated. The plurality of counters is started to coordinate calculation across the plurality of clusters. Reset, debug, and calculation stoppage are provided through the plurality of counters.

Execution engine for executing single assignment programs with affine dependencies

The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory, and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric.

FPGA coprocessor with sparsity and density modules for execution of low and high parallelism portions of graph traversals

An FPGA-based graph data processing method is provided for executing graph traversals on a graph having characteristics of a small-world network by using a first processor being a CPU and a second processor that is a FPGA and is in communicative connection with the first processor, wherein the first processor sends graph data to be traversed to the second processor, and obtains result data of the graph traversals from the second processor for result output after the second processor has completed the graph traversals of the graph data by executing level traversals, and the second processor comprises a sparsity processing module and a density processing module, the sparsity processing module operates in a beginning stage and/or an ending stage of the graph traversals, and the density processing module with a higher degree of parallelism than the sparsity processing module operates in the intermediate stage of the graph traversals.

SPECULATIVE AND ITERATIVE EXECUTION OF DELAYED DATA FLOW GRAPHS
20170308489 · 2017-10-26 ·

A system for executing a data flow graph comprises: at least two first actors each comprising means for independently executing a computation of a same data set comprising at least one datum, and producing a quality descriptor of the data set, the execution of the computation by each of at least two first actors being triggered by a synchronization system; a third actor, comprising means for triggering the execution of the computation by each of at least two first actors, and initializing a clock configured to emit an interrupt signal when a duration has elapsed; a fourth actor, comprising means for executing, at the latest at the interrupt signal from the clock: the selection, from the set of at least two first actors having produced a quality descriptor, of the one whose descriptor exhibits the most favorable value; the transfer of the data set computed by the selected actor.

System and method for an asynchronous processor with assisted token

Embodiments are provided for an asynchronous processor using master and assisted tokens. In an embodiment, an apparatus for an asynchronous processor comprises a memory to cache a plurality of instructions, a feedback engine to decode the instructions from the memory, and a plurality of XUs coupled to the feedback engine and arranged in a token ring architecture. Each one of the XUs is configured to receive an instruction of the instructions form the feedback engine, and receive a master token associated with a resource and further receive an assisted token for the master token. Upon determining that the assisted token and the master token are received in an abnormal order, the XU is configured to detect an operation status for the instruction in association with the assisted token, and upon determining a needed action in accordance with the operation status and the assisted token, perform the needed action.

Dynamically erectable computer system
09772971 · 2017-09-26 ·

A fault-tolerant computer system architecture includes two types of operating domains: a conventional first domain (DID) that processes data and instructions, and a novel second domain (MM domain) which includes mentor processors for mentoring the DID according to “meta information” which includes but is not limited to data, algorithms and protective rule sets. The term “mentoring” (as defined herein below) refers to, among other things, applying and using meta information to enforce rule sets and/or dynamically erecting abstractions and virtualizations by which resources in the DID are shuffled around for, inter alia, efficiency and fault correction. Meta Mentor processors create systems and sub-systems by means of fault tolerant mentor switches that route signals to and from hardware and software entities. The systems and sub-systems created are distinct sub-architectures and unique configurations that may be operated as separately or concurrently as defined by the executing processes.

Execution engine for executing single assignment programs with affine dependencies

The execution engine is a new organization for a digital data processing apparatus, suitable for highly parallel execution of structured fine-grain parallel computations. The execution engine includes a memory for storing data and a domain flow program, a controller for requesting the domain flow program from the memory, and further for translating the program into programming information, a processor fabric for processing the domain flow programming information and a crossbar for sending tokens and the programming information to the processor fabric.

Anti-Congestion Flow Control for Reconfigurable Processors

A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.