H04L49/358

System and method for a single logical IP subnet across multiple independent layer 2 (L2) subnets in a high performance computing environment

Systems and methods for supporting a single logical IP subnet across multiple independent layer 2 subnets in a high performance computing environment. A method can provide, at a computer including one or more microprocessors, a logical device, the logical device being addressed by a layer 3 address, wherein the logical device comprises a plurality of network adapters, each of the network adapters comprising a physical port, and a plurality of switches. The method can arrange the plurality of switches into a plurality of discrete layer 2 subnets. The method can provide a mapping table at the logical device.

SYSTEM AND METHOD FOR SUPPORTING SCALABLE BIT MAP BASED P_KEY TABLE IN A HIGH PERFORMANCE COMPUTING ENVIRONMENT
20220174025 · 2022-06-02 ·

System and method for supporting scalable bitmap based P_Key table in a high performance computing environment. A method can provide, at least one subnet comprising one or more switches, a plurality of host channel adapters, and a plurality of end nodes. The method can associate the plurality of end nodes with at least one of a plurality of partitions, wherein each of the plurality of partitions are associated with a P_Key value. The method can associate each of the one or more switches with a bitmap based P_Key table of a plurality of bitmap based P_Key tables. The method can associate each of the host channel adapters with a bitmap based P_Key table of the plurality of bitmap based P_Key tables.

Inter-processor communications fault handling in high performance computing networks

A computer-implemented method and system for inter-processor communications fault handling in high performance computing networks. The method includes detecting that an InfiniBand (IB) queue pair has transitioned into an error state based on an unsuccessful completion status that relates to unsuccessful delivery of a message from an initiator endpoint at a first server device to at least one target endpoint at a second server device. The initiator and target endpoints are associated with at least one application under execution. An embodiment includes inferring, when the unsuccessful completion status is indicated as flushed, that the message was in a send queue of the IB queue pair when the IB queue pair transitioned into the error state. An embodiment includes establishing an IB Direct Connect queue pair connection between the target and initiator endpoints. An embodiment includes re-queueing the message in the IB queue pair for dispatch to the target endpoint.

System and method for supporting heterogeneous and asymmetric dual rail fabric configurations in a high performance computing environment

Systems and methods for supporting heterogeneous and asymmetric dual rail fabric configurations in a high performance computing environment. A method can provide, comprising at one or more computers each including one or more microprocessors, a plurality hosts, each of the plurality of hosts comprising at least one dual port adapter, a private fabric, the private fabric comprising two or more switches, and a public fabric, the public fabric comprising a cloud fabric. A workload can be provisioned at a host of the plurality of hosts. A placement policy can be assigned to the provisioned workload. Then, network traffic between peer nodes of the provisioned workload can be assigned to one or more of the private fabric and the public fabric in accordance with the placement policy.

SYSTEM AND METHOD FOR PROVIDING BANDWIDTH CONGESTION CONTROL IN A PRIVATE FABRIC IN A HIGH PERFORMANCE COMPUTING ENVIRONMENT

Systems and methods for providing bandwidth congestion control in a private fabric in a high performance computing environment. An exemplary method can provide, at one or more microprocessors, a first subnet, the first subnet comprising a plurality of switches, and a plurality of host channel adapters, wherein each of the host channel adapters comprise at least one host channel adapter port, and wherein the plurality of host channel adapters are interconnected via the plurality of switches, and a plurality of end nodes. The method can provide, at a host channel adapter, an end node ingress bandwidth quota associated with an end node attached to the host channel adapter. The method can receive, at the end node of the host channel adapter, ingress bandwidth, the ingress bandwidth exceeding the ingress bandwidth quota of the end node.

System and method for supporting scalable representation of switch port status in a high performance computing environment

System and method for supporting scalable representation of switch port status in a high performance computing environment. In accordance with an embodiment, a scalable representation of switch port status can be provided. By adding a scalable representation of switch port status at each switch (both physical and virtual)—instead of getting all switch port changes individually, the scalable representation of switch port status can combine a number of ports that can scale by just using a few bits of information for each port's status.

System and method for supporting fast hybrid reconfiguration in a high performance computing environment

A hybrid reconfiguration scheme can allow for fast partial network reconfiguration with different routing algorithms of choice in different subparts of the network. Partial reconfigurations can be orders of magnitude faster than the initial full configuration, thus making it possible to consider performance-driven reconfigurations in lossless networks.

SYSTEM AND METHOD FOR EFFICIENT NETWORK ISOLATION AND LOAD BALANCING IN A MULTI-TENANT CLUSTER ENVIRONMENT

A system and method for supporting load balancing in a multi-tenant cluster environment, in accordance with an embodiment. One or more tenants can be supported and each associated with a partition, which are each in turn associated with one or more end nodes. The method can provide a plurality of switches, the plurality of switches comprising a plurality of leaf switches and at least one switch at another level, wherein each of the plurality of switches comprise at least one port. The method can assign each node a weight parameter, and based upon this parameter, the method can route the plurality of end nodes within the multi-tenant cluster environment, wherein the routing attempts to preserve partition isolation.

System and method for supporting configurable legacy P_Key table abstraction using a bitmap based hardware implementation in a high performance computing environment

System and method for supporting configurable legacy P_Key table abstraction using a bitmap based hardware implementation in a high performance computing environment. A mapping table in DRAM can be provided through the use of a software based SMA that implements the mapping table. With this mapping table, it is possible to provide a legacy compliant view of a bit map based P_Key table. Such a legacy compliant view can be called a virtual P_Key table, or a configurable legacy P_Key table abstraction.

System and method for allowing multiple global identifier (GID) subnet prefix values concurrently for incoming packet processing in a high performance computing environment

System and method for using multiple global identification subnet prefix values in a network switch environment in a high performance computing environment. A packet is received from a network fabric by a first Host Channel Adapter (HCA). The packet has a header portion including a destination subnet prefix identifying a destination subnet of the network fabric. The network HCA is allowed to receive the first packet from a port of the network HCA by selectively determining a logical state of a flag and, selectively in accordance with a predetermined logical state of the flag, ignoring the destination subnet prefix identifying the destination subnet of the network fabric.