G06F15/17343

CONNECTING PROCESSORS USING TWISTED TORUS CONFIGURATIONS
20220173973 · 2022-06-02 ·

Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for connecting processors using twisted torus configurations. In some implementations, a cluster of processing nodes is coupled using a reconfigurable interconnect fabric. The system determines a number of processing nodes to allocate as a network within the cluster and a topology for the network. The system selects an interconnection scheme for the network, where the interconnection scheme is selected from a group that includes at least a torus interconnection scheme and a twisted torus interconnection scheme. The system allocates the determined number of processing nodes of the cluster in the determined topology, sets the reconfigurable interconnect fabric to provide the selected interconnection scheme for the processing nodes in the network, and provides access to the network for performing a computing task.

Switch for routing data in an array of functional configurable units

A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.

Defect repair circuits for a reconfigurable data processor

A device architecture includes a spatially reconfigurable array of processors, such as configurable units of a CGRA, having spare elements, and a parameter store on the device which stores parameters that tag one or more elements as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged elements. As a result, a spatially reconfigurable array having unusable elements can be repaired.

NEURAL PROCESSING ACCELERATOR
20230244632 · 2023-08-03 ·

A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.

Defect avoidance in a multidimensional array of functional configurable units

A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.

IN-NETWORK COLLECTIVE OPERATIONS
20230359582 · 2023-11-09 ·

Examples described herein relate to a switch comprising circuitry configured to for packet communications associated with a collective operation to train machine learning (ML) models: utilize a reliable transport protocol for communications from at least one worker node of the collective operation to a switch, wherein the utilize a reliable transport protocol for communications from at least one worker node of the collective operation to the switch comprises store packet receipt state for per-packet communications from the at least one worker node of the collective operation to the switch and utilize a non-reliable transport protocol by the switch to a device that is to perform aggregation of results, wherein the reliable transport protocol comprises a different protocol than that of the non-reliable transport protocol.

Memory network processor

A multi-processor system with processing elements, interspersed memory, and primary and secondary interconnection networks optimized for high performance and low power dissipation is disclosed. In the secondary network multiple message routing nodes are arranged in an interspersed fashion with multiple processors. A given message routing node may receive messages from other message nodes, and relay the received messages to destination message routing nodes using relative offsets included in the messages. The relative offset may specify a number of message nodes from the message node that originated a message to a destination message node.

Defect Avoidance in a Multidimensional Array of Functional Configurable Units

A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.

Switch for Routing Data in an Array of Functional Configurable Units

A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.

NEURAL PROCESSING ACCELERATOR
20220292049 · 2022-09-15 ·

A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.