Patent classifications
G06F15/17318
Networked computer with multiple embedded rings
A network comprising interconnected first and second processors, each processor comprising one or more of: multiple processing units arranged on a chip configured to execute program code; an on-chip interconnect comprising groups of exchange paths connected to receive data from corresponding groups of the processing units; external interfaces configured to communicate data off-chip as packets, each having a destination address, external interfaces of the first and second processors being connected by an external link; multiple exchange blocks, each connected to groups of the exchange paths; a routing bus configured to route packets between the exchange blocks and the external interfaces. Processing units of the first processor generate off-chip packets such that the group of processing units serviced by the first exchange block on the first processor address off-chip packets to the group of processing units on the second processor serviced by the corresponding first exchange block of the second processor.
Embedding rings on a toroid computer network
A computer comprising a plurality of interconnected processing nodes arranged in a toroid configuration in which multiple layers of interconnected nodes are arranged along an axis; each layer comprising a plurality of processing nodes connected in a ring in a non-axial plane by at least an intralayer respective set of links between each pair of neighbouring processing nodes, the links in each set adapted to operate simultaneously; wherein each of the processing nodes in each layer is connected to a respective corresponding node in each adjacent layer by an interlayer link to form respective rings along the axis; the computer programmed to provide a plurality of embedded one-dimensional logical paths and to transmit data around each of the embedded one-dimensional paths in such a manner that the plurality of embedded one-dimensional logical paths operate simultaneously, each logical path using all processing nodes of the computer in a sequence.
Partitionable Networked Computer
A computer is provided, including a plurality of processing nodes arranged two-dimensional arrays in respective front and rear layers. Each processing node has a set of activatable links. When activated, transmission of data items between the nodes connected via the activated link is enabled. When not activated, transmission of data items between said nodes is prevented. The set of activatable links includes a respective link which connects the processing node to each adjacent node in the array, and to a facing processing node in the other layer. An allocation engine is configured to receive an allocation instruction and connected to the processing nodes to selectively activate the links in a configuration in which: (i) links between adjacent nodes are activated; (ii) links between facing nodes are only activated for edge processing nodes; and (iii) links between processing nodes outside the group and adjacent processing nodes in the group are deactivated.
Autonomous memory architecture
An autonomous memory device in a distributed memory sub-system can receive a database downloaded from a host controller. The autonomous memory device can pass configuration routing information and initiate instructions to disperse portions of the database to neighboring die using an interface that handles inter-die communication. Information is then extracted from the pool of autonomous memory and passed through a host interface to the host controller.
Data replication for accelerator
A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.
Networked computer
A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.
Distributed AI training topology based on flexible cable connection
A data processing system includes a central processing unit (CPU) and accelerator cards coupled to the CPU over a bus, each of the accelerator cards having a plurality of data processing (DP) accelerators to receive DP tasks from the CPU and to perform the received DP tasks. At least two of the accelerator cards are coupled to each other via an inter-card connection, and at least two of the DP accelerators are coupled to each other via an inter-chip connection. Each of the inter-card connection and the inter-chip connection is capable of being dynamically activated or deactivated, such that in response to a request received from the CPU, any one of the accelerator cards or any one of the DP accelerators within any one of the accelerator cards can be enabled or disabled to process any one of the DP tasks received from the CPU.
Communication in a computer having multiple processors
A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
Control of data transfer between processors
A data processing system comprising a plurality of processors, wherein each of the processors is configured to perform data transfer operations to transfer outgoing data to one or more others of the processors during a first of the exchange stages; receive incoming data from the one or more others of the processors during the first of the exchange stages; determine further outgoing data in dependence upon at least part of the incoming data; count an amount of at least part the incoming data received during the first of the exchange stages from the one or more others of the processors; and in response to determining that the amount of the at least part of the incoming data received has reached a predefined amount, perform data transfer operations to transfer the further outgoing data to the one or more others of the processors during a second of the exchange stages.
Network computer with two embedded rings
A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.