Patent classifications
G06F15/7825
Multi-chip processing system and method for adding routing path information into headers of packets
Packet routing within a multi-chip processing system is shown. A first chip has a first interconnect bus, and a first microprocessor coupled to the first interconnect bus. The first interconnect bus has a first routing register. When the first microprocessor operates the first chip as a source node to output a packet to be transferred to a destination node, routing information indicating a routing path from the source node to the destination node is written into the first routing register and then loaded from the first routing register to a header of the packet. While being transferred within the multi-chip processing system from the source node to the destination node, the packet is guided along the routing path indicated in the routing information carried in the header of the packet.
Reconfigurable network-on-chip security architecture
The present disclosure presents an exemplary tier-based reconfigurable security architecture that can adapt to different use-case scenarios by selecting security tiers and configure parameters in each security tier based on system requirements. An exemplary system comprises a security agent that is configured to monitor system characteristics of embedded components on a system-on-chip and communicate a status of the system characteristics to a reconfigurable service engine integrated on the system-on-chip, such that the reconfigurable service engine is configured to activate one of a plurality of tiers of security based at least upon the status of the system characteristics communicated.
LOOP EXECUTION IN A RECONFIGURABLE COMPUTE FABRIC.
Various examples are directed to systems and methods for performing operations in a reconfigurable compute fabric. A dispatch interface may send a first asynchronous message to a first flow controller of a first synchronous flow. The first asynchronous message may instruct the first flow controller to begin execution of a first-level loop. The first synchronous flow may send a second asynchronous message to a second flow controller of a second synchronous flow. The second asynchronous message may instruct the second flow controller to execute a second-level loop. The first flow controller may receive a third asynchronous message indicating that the second-level loop has completed and that a synchronous flow thread is free for executing a next iteration of the first-level loop.
KERNEL MAPPING TO NODES IN COMPUTE FABRIC
A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. Compute kernels can be parsed into directed graphs and mapped to particular node or tile resources for execution. In an example, a branch-and-bound search algorithm can be used to perform the mapping. The algorithm can use a cost function to evaluate the resources based on capability, occupancy, or power consumption of the various node or tile resources.
CONNECTIVITY IN COARSE GRAINED RECONFIGURABLE ARCHITECTURE
A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, the tiles can be arranged in a one-dimensional array and each tile can be coupled to its respective adjacent neighbor tiles using a direct bus coupling. Each tile can be further coupled to at least one non-adjacent neighbor tile that is one tile, or device space, away using a passthrough bus. The passthrough bus can extend through intervening tiles.
MECHANISM TO PROVIDE RELIABLE RECEIPT OF EVENT MESSAGES
Devices and techniques for providing receipts for event messages in a processor are described herein. A system includes multiple memory-compute nodes coupled to one another over a scale fabric; a set of registers; and an event manager hardware circuitry to: receive an event message corresponding to an event, and the event associated with an event mode; track a counter value representing a number of received event messages related to the event, the counter value stored in the set of registers; compare the number of received event messages to a trigger value; and in response to the number of received event messages equaling the trigger value: use an atomic operation to reset the counter value in the set of registers while maintaining the event mode; and alert a thread of the event.
Multi-threaded, self-scheduling reconfigurable computing fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.
Machine learning model updates to ML accelerators
Examples herein describe a peripheral I/O device with a hybrid gateway that permits the device to have both I/O and coherent domains. As a result, the compute resources in the coherent domain of the peripheral I/O device can communicate with the host in a similar manner as CPU-to-CPU communication in the host. The dual domains in the peripheral I/O device can be leveraged for machine learning (ML) applications. While an I/O device can be used as an ML accelerator, these accelerators previously only used an I/O domain. In the embodiments herein, compute resources can be split between the I/O domain and the coherent domain where a ML engine is in the I/O domain and a ML model is in the coherent domain. An advantage of doing so is that the ML model can be coherently updated using a reference ML model stored in the host.
Conditional branching control for a multi-threaded, self-scheduling reconfigurable computing fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.
SPIKING NEURAL NETWORK-BASED DATA PROCESSING METHOD, COMPUTING CORE CIRCUIT, AND CHIP
A computing core circuit, including: an encoding module, a route sending module, and a control module, wherein the control module is configured to control the encoding module to perform encoding processing on a pulse sequence determined by pulses to be transmitted of at least one neuron in a current computing core, so as to obtain an encoded pulse sequence, and control the route sending module to determine a corresponding route packet according to the encoded pulse sequence, so as to send the route packet. The present disclosure further provides a data processing method, a chip, a board, an electronic device, and a computer-readable storage medium.