G06F15/17387

DISTRIBUTED PROCESSING ARCHITECTURE
20210211787 · 2021-07-08 ·

Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising one or more processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. In one embodiment, the switches may be optical network switches. Processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.

High-performance data repartitioning for cloud-scale clusters

Techniques herein partition data using data repartitioning that is store-and-forward, content-based, and phasic. In embodiments, computer(s) maps network elements (NEs) to grid points (GPs) in a multidimensional hyperrectangle. Each NE contains data items (DIs). For each particular dimension (PD) of the hyperrectangle the computers perform, for each particular NE (PNE), various activities including: determining a linear subset (LS) of NEs that are mapped to GPs in the hyperrectangle at a same position as the GP of the PNE along all dimensions of the hyperrectangle except the PD, and data repartitioning that includes, for each DI of the PNE, the following activities. The PNE determines a bit sequence based on the DI. The PNE selects, based on the PD, a bit subset of the bit sequence. The PNE selects, based on the bit subset, a receiving NE of the LS. The PNE sends the DI to the receiving NE.

Partitionable Networked Computer
20200311017 · 2020-10-01 ·

A computer comprising a plurality of processing nodes is provided. Each processing node has at least one processor configured to process input data to generate an array of data items. The processing nodes are arranged in cliques in which each processing node of a clique is connected to each other processing node in the clique by first and second clique links. The cliques are inter-connected in rings such that each processing node is a member of a single clique and a single ring. The processing nodes of all cliques are configured to exchange in each exchange step of a machine learning collective via the respective first and second clique links at least two data items with the other processing node(s) in its clique, and all processing nodes are configured to reduce each received data item with the data item in the corresponding position in the array on that processing node.

Networked Computer With Multiple Embedded Rings
20200311528 · 2020-10-01 ·

A computer comprising a plurality of interconnected processing nodes arranged in multiple stacked layers forming a multi-face prism is provided. Each face of the prism comprises multiple stacked pairs of nodes. Said nodes are connected by at least two intralayer links. Each node is connected to a corresponding node in an adjacent pair by an interlayer link. The corresponding nodes are connected by respective interlayer links to form respective edges. Each pair forms part of a layers, each layer comprising multiple nodes, each node connected to their neighbouring nodes in the layer by at least one of the intralayer links to form a ring. Data is transmitted around paths formed by respective sets of nodes and links, each path having a first portion between a first and second endmost layers, and a second portion provided between the second and first endmost layers and comprising one of the edges.

Networked Computer with Multiple Embedded Rings
20200311529 · 2020-10-01 ·

According to an aspect of the invention, there is provided a computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple stacked layers. Each layer comprises four processing nodes connected by respective links between the processing nodes. In end layers of the stack, the four processing nodes are interconnected in a ring formation by two links between the nodes, the two links adapted to operate simultaneously. Processing nodes in the multiple stacked layers provide four faces, each face comprising multiple layers, each layer comprising a pair of processing nodes. The processing nodes are programmed to operate a configuration to transmit data around embedded one-dimensional rings, each ring formed by processing nodes in two opposing faces.

Digital Processing Connectivity
20200301876 · 2020-09-24 ·

A connectivity has a first network (25) of signal-links interconnecting a large plurality of address-bearing, computing cells (20 and 22). Some of the links are selectable according to addresses hierarchically ordered along a recursive curve. Most of the address-designated links that form the network are switchably operable between cells such that a first selectable set of cells along one segment of the recursive curve form signal-routes to a second selectable set of cells, along a second segment. For receipt of instructions and for synchronisation, some segments have a switchable signal-path from one controlling cell of that segment. A second network (23) has signal-links interconnecting a plurality of processing cells (19 and 21) some of which control the loading of data into cells of the first network. The computing and processing cells have pairwise matching of addresses and are pairwise coterminous, which ensures that control of the connectivity by second network (23) is directed to localisably-selectable segments of first network (25).

Embedding Rings on a Toroid Computer Network
20200293478 · 2020-09-17 ·

A computer comprising a plurality of interconnected processing nodes arranged in a configuration with multiple layers, arranged along an axis, comprising first and second endmost layers and at least one intermediate layer between the first and second endmost layers is provided. Each layer comprises a plurality of processing nodes connected in a ring by an intralayer respective set of links between each pair of neighbouring processing nodes, the links adapted to operate simultaneously. Nodes in each layer are connected to respective corresponding nodes in each adjacent layer by an interlayer link. Each processing node in the first endmost layer is connected to a corresponding node in the second endmost layer. Data is transmitted around a plurality of embedded one-dimensional logical rings with an asymmetric bandwidth utilisation, each logical ring using all processing nodes of the computer in such a manner that the plurality of embedded one-dimensional logical rings operate simultaneously.

I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK
20200228435 · 2020-07-16 ·

A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

MAPPING 2-DIMENSIONAL MESHES ON 3-DIMENSIONAL TORUS
20200220787 · 2020-07-09 ·

In a LMN 3D torus network of computer nodes, a 2-dimensional plane comprising MN torus network of nodes mapped into M/2 meshes2*N torus network of nodes. N can be k*M, k is an integer greater than zero, and M and N are even numbers. Each of M/2 mesh of the 2*N torus is contiguous in the 2D plane. Mapping is performed for each of the L planes of the LMN 3D torus network. The M/2 meshes are combined with a remaining torus network dimension comprising L planes, the combining creating another 2*N pattern, wherein an L*M/22*N communication pattern is created. Application entities are executed according to the mapped L*M/22*N communication pattern.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING PROGRAM
20200192717 · 2020-06-18 · ·

An information processing apparatus for controlling a plurality of nodes mutually coupled via a plurality of cables, the apparatus includes: a memory; a processor coupled to the memory, the processor being configured to cause a first node to execute first processing to extract coupling relationship between the plurality of nodes, the first node being one of the plurality of nodes, being sequentially allocated from each of the plurality of nodes, the first processing including executing allocation processing that allocates unique coordinate information to the first node and allocates common coordinate information to nodes excluding the first node; executing transmission processing that causes the first node to transmit first information to each of the cables coupled to the first node; and executing identification processing that identifies a node having received the first information as neighboring node coupled to one of the plurality of cables coupled to the first node.