Patent classifications
G06F15/825
FLOW MODEL COMPUTATION SYSTEM WITH DISCONNECTED GRAPHS
A computing device determines a node traversal order for computing a computational parameter value for each node of a data model of a system that includes a plurality of disconnected graphs. The data model represents a flow of a computational parameter value through the nodes from a source module to an end module. A flow list defines an order for selecting and iteratively processing each node to compute the computational parameter value in a single iteration through the flow list. Each node from the flow list is selected to compute a driver quantity for each node. Each node is selected from the flow list in a reverse order to compute a driver rate and the computational parameter value for each node. The driver quantity or the computational parameter value is output for each node to predict a performance of the system.
Flow model computation system with disconnected graphs
A computing device determines a node traversal order for computing a computational parameter value for each node of a data model of a system that includes a plurality of disconnected graphs. The data model represents a flow of a computational parameter value through the nodes from a source module to an end module. A flow list defines an order for selecting and iteratively processing each node to compute the computational parameter value in a single iteration through the flow list. Each node from the flow list is selected to compute a driver quantity for each node. Each node is selected from the flow list in a reverse order to compute a driver rate and the computational parameter value for each node. The driver quantity or the computational parameter value is output for each node to predict a performance of the system.
Loop execution in a reconfigurable compute fabric using flow controllers for respective synchronous flows
Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
Massively parallel hierarchical control system and method
A system is disclosed for controlling controllable elements of an external component. The system uses a state translator subsystem (STS) which receives a state command from an external subsystem. The STS has at least one module for processing the state command and generating operational commands, in parallel, over a first plurality of channels, to control the elements of the external component. A programmable calibration command translation layer subsystem (PCCTL) uses the operational commands to generate granular level commands for controlling the elements, and to transmit the granular level commands over a second plurality of channels. A subsystem is coupled between the PCCTL and the elements, which receives the commands from the PCCTL and uses the commands to generate final output commands, which are applied in parallel, over a third plurality of channels, to the elements.
Enabling accelerated processing units to perform dataflow execution
Methods and systems are disclosed for performing dataflow execution by an accelerated processing unit (APU). Techniques disclosed include decoding information from one or more dataflow instructions. The decoded information is associated with dataflow execution of a computational task. Techniques disclosed further include configuring, based on the decoded information, dataflow circuitry, and, then, executing the dataflow execution of the computational task using the dataflow circuitry.
Methods, systems, and apparatuses to perform a compute operation according to a configuration packet and comparing the result to data in local memory
Methods, apparatuses, and systems for implementing data flows in a processor are described herein. A data flow manager may be configured to generate a configuration packet for a compute operation based on status information regarding multiple processing elements of the processor. Accordingly, multiple processing elements of a processor may concurrently process data flows based on the configuration packet. For example, the multiple processing elements may implement a mapping of processing elements to memory, while also implementing identified paths, through the processor, for the data flows. After executing the compute operation at certain processing elements of the processor, the processing results may be provided. In speech signal processing operations, the processing results may be compared to phonemes to identify such components of human speech in the processing results. Once dynamically identified, the processing elements may continue comparing additional components of human speech to facilitate processing of an audio recording, for example.
SYSTEMS AND METHODS FOR STREAM-DATAFLOW ACCELERATION
According to some embodiments, a dataflow accelerator comprises a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA). The scratchpad comprises a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA receives data from the input vector port interface where the CGRA comprising a plurality of interconnects and a plurality of functional units.
MASSIVELY PARALLEL HIERARCHICAL CONTROL SYSTEM AND METHOD
An electronic control system is disclosed for controlling individually controllable elements of an external component. In one embodiment the system may include a state translator subsystem for receiving a state command from an external subsystem. The state translator subsystem may have at least one module for processing the state command and generating operational commands for controlling the elements to achieve a desired state or condition. A programmable calibration command translation layer (PCCTL) subsystem may be included which receives and uses the operational commands to generate granular level commands for controlling the elements. A feedback control layer subsystem may be included which applies the granular level commands to the elements, and further modifies the granular level commands as needed to control the elements in closed loop fashion.
APPARATUS, METHODS, AND SYSTEMS FOR UNSTRUCTURED DATA FLOW IN A CONFIGURABLE SPATIAL ACCELERATOR
Systems, methods, and apparatuses relating to unstructured data flow in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a data path having a first branch and a second branch, and the data path comprising at least one processing element; a switch circuit comprising a switch control input to receive a first switch control value to couple an input of the switch circuit to the first branch and a second switch control value to couple the input of the switch circuit to the second branch; a pick circuit comprising a pick control input to receive a first pick control value to couple an output of the pick circuit to the first branch and a second pick control value to couple the output of the pick circuit to a third branch of the data path; a predicate propagation processing element to output a first edge predicate value and a second edge predicate value based on (e.g., both of) a switch control value from the switch control input of the switch circuit and a first block predicate value; and a predicate merge processing element to output a pick control value to the pick control input of the pick circuit and a second block predicate value based on both of a third edge predicate value and one of the first edge predicate value or the second edge predicate value.
Dataflow Triggered Tasks for Accelerated Deep Learning
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.