H04L49/358

System and method for network tapestry multiprotocol integration
11558347 · 2023-01-17 · ·

Systems and methods for connecting devices via a virtual global network across network fabrics using a network tapestry are disclosed. The network system comprises a first access point server in communication with a first backbone exchange server, a second access point server in communication with a second backbone exchange server, and a network tapestry comprising a first communication path connecting the first and second access point servers and a second communication path connecting the first and second backbone exchange servers.

Transferring data between solid state drives (SSDs) via a connection between the SSDs

A first solid state drive (SSD) includes a built-in network interface device configured to communicate via a network fabric, and a second SSD includes a built-in network interface device configured to communicate via the network fabric. A connection is opened between the first SSD and the second SSD over the network fabric, where the first SSD is further communicatively coupled to the second SSD further over an interconnect associated with a host computer. The first SSD encapsulates a non-volatile memory over fabric (NVMe-oF) command to transfer data between the first SSD and the second SSD in a capsule and sends the capsule to the second SSD over the connection. The second SSD executes the NVMe command to transfer the data between the first SSD and the second SSD over the connection according to an NVMe-oF communication protocol and without transferring any of the data to the host computer.

System and method for providing bandwidth congestion control in a private fabric in a high performance computing environment

Systems and methods for providing bandwidth congestion control in a private fabric in a high performance computing environment. An exemplary method can provide, at one or more microprocessors, a first subnet, the first subnet comprising a plurality of switches, and a plurality of host channel adapters, wherein each of the host channel adapters comprise at least one host channel adapter port, and wherein the plurality of host channel adapters are interconnected via the plurality of switches, and a plurality of end nodes. The method can provide, at a host channel adapter, an end node ingress bandwidth quota associated with an end node attached to the host channel adapter. The method can receive, at the end node of the host channel adapter, ingress bandwidth, the ingress bandwidth exceeding the ingress bandwidth quota of the end node.

System and method for supporting dual-port virtual router in a high performance computing environment

Systems and methods for supporting dual-port virtual router in a high performance computing environment. In accordance with an embodiment, a dual port router abstraction can provide a simple way for enabling subnet-to-subnet router functionality to be defined based on a switch hardware implementation. A virtual dual-port router can logically be connected outside a corresponding switch port. This virtual dual-port router can provide an InfiniBand specification compliant view to a standard management entity, such as a Subnet Manager. In accordance with an embodiment, a dual-ported router model implies that different subnets can be connected in a way where each subnet fully controls the forwarding of packets as well as address mappings in the ingress path to the subnet.

System and method to provide homogeneous fabric attributes to reduce the need for SA access in a high performance computing environment

Systems and methods for InfiniBand fabric optimizations to minimize SA access and startup failover times. A system can comprise one or more microprocessors, a first subnet, the first subnet comprising a plurality of switches, a plurality of host channel adapters, a plurality of hosts, and a subnet manager, the subnet manager running on one of the one or more switches and the plurality of host channel adapters. The subnet manager can be configured to determine that the plurality of hosts and the plurality of switches support a same set of capabilities. On such determination, the subnet manager can configure an SMA flag, the flag indicating that a condition can be set for each of the host channel adapter ports.

Traffic class arbitration based on priority and bandwidth allocation

This disclosure describes systems, devices, methods and computer readable media for enhanced network communication for use in higher performance applications including storage, high performance computing (HPC) and Ethernet-based fabric interconnects. In some embodiments, a network controller may include a transmitter circuit configured to transmit packets on a plurality of virtual lanes (VLs), the VLs associated with a defined VL priority and an allocated share of network bandwidth. The network controller may also include a bandwidth monitor module configured to measure bandwidth consumed by the packets and an arbiter module configured to adjust the VL priority based on a comparison of the measured bandwidth to the allocated share of network bandwidth. The transmitter circuit may be further configured to transmit the packets based on the adjusted VL priority.

System and method for supporting aggressive credit waiting in a high performance computing environment

System and method for aggressive credit waiting in a high performance computing environment. In accordance with an embodiment, systems and methods can provide for an indexed matrix of credit wait policies between ports within a single switch. In addition, systems and methods can provide for an array of credit wait polices at an egress port from a switch, the array being indexed by virtual lane.

NETWORK MONITORING TOOL FOR ALLOCATING NODES OF SUPERCOMPUTERS
20230037170 · 2023-02-02 · ·

Disclosed herein are embodiments of a network monitoring device for a supercomputer system having a plurality of supercomputer nodes. The network monitoring device may utilize plug-in software modules to provide network monitoring capabilities related to discovering the network topologies of the supercomputer system, determining network and computing resources that are available for new applications in the supercomputer system, collecting network and computing resources that are being used by running software applications in the supercomputer system, and monitoring running software applications on the supercomputer system.

Network monitoring tool for supercomputers

Disclosed herein are embodiments of a network monitoring device for a supercomputer system having a plurality of supercomputer nodes. The network monitoring device may utilize plug-in software modules to provide network monitoring capabilities related to discovering the network topologies of the supercomputer system, determining network and computing resources that are available for new applications in the supercomputer system, collecting network and computing resources that are being used by running software applications in the supercomputer system, and monitoring running software applications on the supercomputer system.

System and method for efficient network isolation and load balancing in a multi-tenant cluster environment

A system and method for supporting load balancing in a multi-tenant cluster environment, in accordance with an embodiment. One or more tenants can be supported and each associated with a partition, which are each in turn associated with one or more end nodes. The method can provide a plurality of switches, the plurality of switches comprising a plurality of leaf switches and at least one switch at another level, wherein each of the plurality of switches comprise at least one port. The method can assign each node a weight parameter, and based upon this parameter, the method can route the plurality of end nodes within the multi-tenant cluster environment, wherein the routing attempts to preserve partition isolation.