Patent classifications
G06F15/7878
Reconfigurable Parallel Processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
Dynamically configurable pipeline
Techniques disclosed herein relate to dynamically configurable multi-stage pipeline processing units. In one embodiment, a circuit includes a plurality of processing engines and a plurality of switches. Each of the plurality of processing engines includes an input port and an output port. Each of the plurality of switches comprises two input ports and two output ports. For each processing engine, the input port of the processing engine is electrically coupled to one of the switches, the output port of the processing engine is electrically coupled to another one of the switches, and the input port of the processing engine is electrically coupled to the output port of each of the processing engines by the switches.
Shared memory access for reconfigurable parallel processor using a plurality of memory ports each comprising an address calculation unit
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) each having a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a common area in the memory unit.
Private memory access for reconfigurable parallel processor using a plurality of memory ports each comprising an address calculation unit
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.
Circular reconfiguration for reconfigurable parallel processor using a plurality of memory ports coupled to a commonly accessible memory unit
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of reconfigurable units that may include a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of reconfigurable units may comprise a configuration buffer and a reconfiguration counter. The processor may further comprise a sequencer coupled to the configuration buffer of each of the plurality of reconfigurable units and configured to distribute a plurality of configurations to the plurality of reconfigurable units for the plurality of PEs and the plurality of MPs to execute a sequence of instructions.
Reconfigurable parallel processing with a temporary data storage coupled to a plurality of processing elements (PES) to store a PE execution result to be used by a PE during a next PE configuration
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
Pipelining multi-directional reduction
Embodiments for pipelining multi-directional reduction by one or more processors in a computing system. One or more reduce scatter operations and one or more all-gather operations may be assigned to each of a plurality of independent networks. The one or more reduce scatter operations and the one or more all-gather operations may be sequentially executed in each of the plurality of independent networks according to a serialized execution order and a defined time period.
RECONFIGURABLE PARALLEL PROCESSING
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
Static shared memory access with one piece of input data to be reused for successive execution of one instruction in a reconfigurable parallel processor
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise an arithmetic logic unit (ALU), a data buffer associated with the ALU, and an indicator associated with the data buffer to indicate whether a piece of data inside the data buffer is to be reused for repeated execution of a same instruction as a pipeline stage.
POLICY HANDLING FOR DATA PIPELINES
Methods, systems, and devices for data processing are described. In some systems, data pipelines may be implemented to handle data processing jobs. To improve data pipeline flexibility, the systems may use separate pipeline and policy declarations. For example, a pipeline server may receive both a pipeline definition defining a first set of data operations to perform and a policy definition including instructions for performing a second set of data operations, where the first set of data operations is a subset of the second set. The server may execute a data pipeline based on a trigger (e.g., a scheduled trigger, a received message, etc.). To execute the pipeline, the server may layer the policy definition into the pipeline definition when creating an execution plan. The server may execute the execution plan by performing a number of jobs using a set of resources and plugins according to the policy definition.