Patent classifications
G06F15/17343
Memory network processor
A multi-processor system with processing elements, interspersed memory, and primary and secondary interconnection networks optimized for high performance and low power dissipation is disclosed. In the secondary network multiple message routing nodes are arranged in an interspersed fashion with multiple processors. A given message routing node may receive messages from other message nodes, and relay the received messages to destination message routing nodes using relative offsets included in the messages. The relative offset may specify a number of message nodes from the message node that originated a message to a destination message node.
CONNECTING PROCESSORS USING TWISTED TORUS CONFIGURATIONS
Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for connecting processors using twisted torus configurations. In some implementations, a cluster of processing nodes is coupled using a reconfigurable interconnect fabric. The system determines a number of processing nodes to allocate as a network within the cluster and a topology for the network. The system selects an interconnection scheme for the network, where the interconnection scheme is selected from a group that includes at least a torus interconnection scheme and a twisted torus interconnection scheme. The system allocates the determined number of processing nodes of the cluster in the determined topology, sets the reconfigurable interconnect fabric to provide the selected interconnection scheme for the processing nodes in the network, and provides access to the network for performing a computing task.
High performance interconnect
A device includes a receiver to receive one or more training sequences during a training of a link, where the link connects two devices. The device may include agent logic to determine, from the one or more training sequences, a number of extension devices on the link between the two devices, and determine that the number of extension devices exceeds a threshold number. The device may include a transmitter to send a plurality of clock compensation ordered sets on the link based on determining that the number of extension devices exceeds a threshold number.
Memory Network Processor
A multi-processor system with processing elements, interspersed memory, and primary and secondary interconnection networks optimized for high performance and low power dissipation is disclosed. In the secondary network multiple message routing nodes are arranged in an interspersed fashion with multiple processors. A given message routing node may receive messages from other message nodes, and relay the received messages to destination message routing nodes using relative offsets included in the messages. The relative offset may specify a number of message nodes from the message node that originated a message to a destination message node.
Neural processing accelerator
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.
CPU AND MULTI-CPU SYSTEM MANAGEMENT METHOD
The present disclosure provides a multi-CPU system, where the multi-CPU system includes: at least two Quick-Path Interconnect QPI domains, a first node controller NC group, and a second node controller NC group; according to a CPU route configuration, there is at least one CPU that can access a CPU in another QPI domain by using the first NC group; and there is at least one CPU that can access a CPU in another QPI domain by using the second NC group. According to this topology, hot swap of an NC can be implemented while the system is relatively slightly affected.
Connecting processors using twisted torus configurations
Methods, systems, and apparatus, including computer programs encoded on computer-storage media, for connecting processors using twisted torus configurations. In some implementations, a cluster of processing nodes is coupled using a reconfigurable interconnect fabric. The system determines a number of processing nodes to allocate as a network within the cluster and a topology for the network. The system selects an interconnection scheme for the network, where the interconnection scheme is selected from a group that includes at least a torus interconnection scheme and a twisted torus interconnection scheme. The system allocates the determined number of processing nodes of the cluster in the determined topology, sets the reconfigurable interconnect fabric to provide the selected interconnection scheme for the processing nodes in the network, and provides access to the network for performing a computing task.
MULTI-PROCESSING UNIT INTERCONNECTED ACCELERATOR SYSTEMS AND CONFIGURATION TECHNIQUES
A compute system providing hierarchical scaling can include one or more sets of parallel processing units. The parallel processing units in a set can be organized into subsets of parallel processing units. Each parallel processing unit can be configurably couplable to two nearest neighbor parallel processing units in a same subset by two communication links, and each parallel processing unit can be configurably couplable to farthest neighbor parallel processing unit in the same subset by one communication link. Furthermore, each parallel processing unit can be configurably couplable to a corresponding parallel processing unit in the other subset by two communication links. The compute system can be configured by configuring the communication links of a set of parallel processing units into one or more compute clusters including a corresponding number of communication rings based on a specified compute parameter. Input data for computing on a given compute cluster divided and loaded onto respective parallel processing units of the given compute cluster. A function can be computed on the loaded input data by the given compute cluster using a parallel communication ring algorithm of the function.
SYSTEM AND METHOD FOR DEFINING MACHINE-TO-MACHINE COMMUNICATING DEVICES AND DEFINING AND DISTRIBUTING COMPUTATIONAL TASKS AMONG SAME
A method for issuing commands to remote devices comprising determining a criterion that forms a rule for a service, the service comprising a service property, a service method, and a service event distributing the rule to a behavior engine on a programmable device, the behavior engine comprising a set of rules, and evaluating, at the behavior engine, if a trigger criterion for the rule is met. Upon determining that the trigger criterion is met, the method may further comprise performing an action comprising evaluating, at the behavior engine, if a condition is met, and upon determining that the condition is met, issuing a command to perform a first action comprising setting a service property and calling a service method for all devices including the service property within a scope of the action, defining an action scope.
Neural processing accelerator
A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.