G06F15/7878

Private memory access for a reconfigurable parallel processor using a plurality of chained memory ports

Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.

PIPELINING MULTI-DIRECTIONAL REDUCTION

Embodiments for pipelining multi-directional reduction by one or more processors in a computing system. One or more reduce scatter operations and one or more all-gather operations may be assigned to each of a plurality of independent networks. The one or more reduce scatter operations and the one or more all-gather operations may be sequentially executed in each of the plurality of independent networks according to a serialized execution order and a defined time period.

Reconfiguring execution pipelines of out-of-order (OOO) computer processors based on phase training and prediction

Reconfiguring execution pipelines of out-of-order (OOO) computer processors based on phase training and prediction is disclosed. In one aspect, a pipeline reconfiguration circuit is communicatively coupled to an execution pipeline providing multiple selectable pipeline configurations. The pipeline reconfiguration circuit generates a phase identifier (ID) for a phase based on a preceding phase. The phase ID is used as an index into an entry of a pipeline configuration prediction (PCP) table to determine whether training for the phase is ongoing. If so, the pipeline reconfiguration circuit performs multiple training cycles, each employing a pipeline configuration from the selectable pipeline configurations for the execution pipeline, to determine a preferred pipeline configuration for the phase. If training for the phase is complete, the pipeline reconfiguration circuit reconfigures the execution pipeline into the preferred pipeline configuration indicated by the entry before the phase is executed.

PIPELINED COGNITIVE SIGNAL PROCESSOR
20200034331 · 2020-01-30 ·

Techniques for denoising an electromagnetic signal are disclosed. The techniques utilize an antenna, a weight adaptation component, a reservoir computer including a computer interpretable neural network, a delay embedding component, and an output layer computer. The techniques include passively acquiring an electromagnetic signal by the antenna, producing a plurality of reservoir state values by the reservoir computer based on the electromagnetic signal, collecting the plurality of reservoir state values by the delay embedding component into a historical record, determining a plurality of reservoir state value weights by the weight adaptation component based at least in part of the historical record, scaling, by the plurality of reservoir state value weights, to produce a plurality of output values, the plurality of reservoir state values by the output layer computer, and outputting the plurality of output values, where the scaling occurs over a plurality of clock cycles of a clock for the system.

PIPELINED CONFIGURABLE PROCESSOR
20200026685 · 2020-01-23 ·

A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle.

Reconfigurable Parallel Processing
20200004553 · 2020-01-02 ·

Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.

CUSTOMIZING OPERATOR NODES FOR GRAPHICAL REPRESENTATIONS OF DATA PROCESSING PIPELINES
20190384577 · 2019-12-19 ·

A method may include receiving, from a client, a request to customize an operator node corresponding to a data processing operation. The request may include a first key. The operator node may be selected for inclusion in a graph representative of a data processing pipeline. The operator node may be associated with a first file that includes at least one configuration parameter associated with the operator node. The at least one configuration parameter may be associated with a second key. In response to the first key being determined to match the second key, the operator node may be customized by modifying the at least one configuration parameter. Furthermore, a second file associated with a customized operator node may be generated to store the customizations made to the operator node including the modification of the at least one configuration parameter. Related systems and articles of manufacture are also provided.

Reconfigurable parallel processing
11971847 · 2024-04-30 · ·

Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.

Pipelined configurable processor
10275390 · 2019-04-30 · ·

A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle.

INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD
20190026247 · 2019-01-24 · ·

An information processing apparatus having a reconfigurable circuit capable of rewriting a logic circuit includes, a process determination circuit that determines which of a plurality of processes is to be executed, a standby buffer circuit that holds process data to be used in a process waiting for execution among processes determined by the process determination circuit, and a rewrite control circuit that rewrites the current logic circuit written in the reconfigurable circuit to a logic circuit that executes one of the plurality of processes waiting for execution using each of a plurality of process data held in the standby buffer circuit when the amount of process data held in the standby buffer circuit exceeds a first predetermined amount.