Patent classifications
G06F15/17387
NEURAL NETWORK ACCELERATOR
A computing element array system includes an array of computing elements connected by connections. Each computing element has a control circuit, a storage circuit, and an operation circuit and the connections each connect two computing elements. The storage circuit can input and store a data packet comprising a data value and a target-tag from one of the connections. The operation circuit can perform an operation on the data value to form a processed data value. The target-tag specifies a computing element to perform the operation on the data value. The control circuit can identify a computing element from the target-tag, enable the operation circuit to process the data value if the identified computing element matches the computing element, modify the data packet to comprise the processed data value, and enable the output of the modified data packet on one of the connections.
Methods and apparatus for signal flow graph pipelining in an array processing unit that reduces storage of temporary variables
A system for pipelining signal flow graphs by a plurality of shared memory processors organized in a 3D physical arrangement with the memory overlaid on the processor nodes that reduces storage of temporary variables. A group function formed by two or more instructions to specify two or more parts of the group function. A first instruction specifies a first part and specifies control information for a second instruction adjacent to the first instruction or at a pre-specified location relative to the first instruction. The first instruction when executed transfers the control information to a pending register and produces a result which is transferred to an operand input associated with the second instruction. The second instruction specifies a second part of the group function and when executed transfers the control information from the pending register to a second execution unit to adjust the second execution unit's operation on the received operand.
I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK
A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
Systems and methods for two layer three dimensional torus switched connectivity datacenter topology
The present invention relates generally to a datacenter environment. Aspects of the present invention include employing a plurality of switches within each datacenter rack. In embodiments of the present invention the switches can be connected in a three dimensional torus topology. In embodiments of the present invention a spine switch can be connected the switches forming a two layer three dimensional topology.
METHODS AND APPARATUS FOR ADJACENCY NETWORK DELIVERY OF OPERANDS TO INSTRUCTION SPECIFIED DESTINATIONS THAT REDUCES STORAGE OF TEMPORARY VARIABLES
A system for pipelining signal flow graphs by a plurality of shared memory processors organized in a 3D physical arrangement with the memory overlaid on the processor nodes that reduces storage of temporary variables. A group function formed by two or more instructions to specify two or more parts of the group function. A first instruction specifies a first part and specifies control information for a second instruction adjacent to the first instruction or at a pre-specified location relative to the first instruction. The first instruction when executed transfers the control information to a pending register and produces a result which is transferred to an operand input associated with the second instruction. The second instruction specifies a second part of the group function and when executed transfers the control information from the pending register to a second execution unit to adjust the second execution unit's operation on the received operand.
Multi-petascale highly efficient parallel supercomputer
- Sameh Asaad ,
- Ralph E. Bellofatto ,
- Michael A. Blocksome ,
- Matthias A. Blumrich ,
- Peter Boyle ,
- Jose R. Brunheroto ,
- Dong Chen ,
- Chen-Yong Cher ,
- George L. Chiu ,
- Norman Christ ,
- Paul W. Coteus ,
- Kristan D. Davis ,
- Gabor J. Dozsa ,
- Alexandre E. Eichenberger ,
- Noel A. Eisley ,
- Matthew R. Ellavsky ,
- Kahn C. Evans ,
- Bruce M. Fleischer ,
- Thomas W. Fox ,
- Alan Gara ,
- Mark E. Giampapa ,
- Thomas M. Gooding ,
- Michael K. Gschwind ,
- John A. Gunnels ,
- Shawn A. Hall ,
- Rudolf A. Haring ,
- Philip Heidelberger ,
- Todd A. Inglett ,
- Brant L. Knudson ,
- Gerard V. Kopcsay ,
- Sameer Kumar ,
- Amith R. Mamidala ,
- James A. Marcella ,
- Mark G. Megerian ,
- Douglas R. Miller ,
- Samuel J. Miller ,
- Adam J. Muff ,
- Michael B. Mundy ,
- John K. O'Brien ,
- Kathryn M. O'Brien ,
- Martin Ohmacht ,
- Jeffrey J. Parker ,
- Ruth J. Poole ,
- Joseph D. Ratterman ,
- Valentina Salapura ,
- David L. Satterfield ,
- Robert M. Senger ,
- Burkhard Steinmacher-Burow ,
- William M. Stockdell ,
- Craig B. Stunkel ,
- Krishnan Sugavanam ,
- Yutaka Sugawara ,
- Todd E. Takken ,
- Barry M. Trager ,
- James L. Van Oosten ,
- Charles D. Wait ,
- Robert E. Walkup ,
- Alfred T. Watson ,
- Robert W. Wisniewski ,
- Peng Wu
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
I/O routing in a multidimensional torus network
A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
I/O ROUTING IN A MULTIDIMENSIONAL TORUS NETWORK
A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
Array of processor core circuits with reversible tiers
Embodiments of the invention relate to an array of processor core circuits with reversible tiers. One embodiment comprises multiple tiers of core circuits and multiple switches for routing packets between the core circuits. Each tier comprises at least one core circuit. Each switch comprises multiple router channels for routing packets in different directions relative to the switch, and at least one routing circuit configured for reversing a logical direction of at least one router channel.
Quasi-optimized interconnection network for, and method of, interconnecting nodes in large-scale, parallel systems
A plurality of data links interconnects a number (N) of nodes of a large-scale, parallel system with minimum data transfer latency. A maximum number (K) of the data links connect each node to the other nodes. The number (N) of the nodes is related to the maximum number (K) of the data links by the expression: N=2.sup.K. An average distance (A) of the shortest distances between all pairs of the nodes, and a diameter (D), which is a largest of the shortest distances, are minimized.