H04L49/254

Queue-to-port allocation

Examples described herein relate to an apparatus including at least one memory and at least one processor communicatively coupled to the at least one memory, the at least one processor to: allocate a scheduler to an egress port and based on unavailability of an egress port, allocate the scheduler to a second egress port to cause any packet allocated to a transmit queue associated with the scheduler to be transmitted using the second egress port. In some examples, a system receives a packet at a port on a network interface, associates a port group with the packet, determines a receive queue for the packet, and copies the packet to the determined receive queue. The port group can be adjusted to remove the port or to add a second port.

Resource fairness enforcement in shared IO interfaces
11593136 · 2023-02-28 · ·

Described are platforms, systems, and methods for resource fairness enforcement. In one aspect, a programmable input output (IO) device comprises a memory unit, the memory unit having instructions stored thereon which, when executed by the programmable IO device, cause the programmable IO device to perform operations comprising: receiving an input from a logical interface (LIF); determining, by at least one meter, a metric regarding at least one resource used during a processing of the input through a programmable pipeline; and regulating additional input received from the LIF based on the metric and a threshold for the at least one resource.

Methods and apparatus for flow control associated with a switch fabric

In some embodiments, an apparatus includes a flow control module configured to receive a first data packet from an output queue of a stage of a multi-stage switch at a first rate when an available capacity of the output queue crosses a first threshold. The flow control module is configured to receive a second data packet from the output queue of the stage of the multi-stage switch at a second rate when the available capacity of the output queue crosses a second threshold. The flow control module configured to send a flow control signal to an edge device of the multi-stage switch from which the first data packet or the second data packet entered the multi-stage switch.

Load-Balanced Fine-Grained Adaptive Routing in High-Performance System Interconnect
20230014645 · 2023-01-19 ·

A switch is provided for load-balanced fine-grained adaptive routing in a high-performance interconnection network. The switch includes a plurality of egress ports to transmit packets, and one or more ingress ports to receive packets. The switch also includes a network capacity circuit for obtaining network capacity for transmitting packets via the plurality of egress ports. The switch also includes a port sequence generation circuit configured to generate a port sequence that defines a pseudo-randomly interleaved sequence of a plurality of path options via the plurality of egress ports, based on the network capacity. The switch also includes a routing circuit for routing one or more packets, received from the one or more ingress ports, towards a destination, based on the port sequence.

System and method for supporting dual-port virtual router in a high performance computing environment

Systems and methods for supporting dual-port virtual router in a high performance computing environment. In accordance with an embodiment, a dual port router abstraction can provide a simple way for enabling subnet-to-subnet router functionality to be defined based on a switch hardware implementation. A virtual dual-port router can logically be connected outside a corresponding switch port. This virtual dual-port router can provide an InfiniBand specification compliant view to a standard management entity, such as a Subnet Manager. In accordance with an embodiment, a dual-ported router model implies that different subnets can be connected in a way where each subnet fully controls the forwarding of packets as well as address mappings in the ingress path to the subnet.

Fair arbitration between multiple sources targeting a destination

A hardware module comprises at least a first ingress buffer and a second ingress buffer, where the second ingress buffer holds data packets from a plurality of source components. To ensure fairness between one or more sources providing data to the first ingress buffer and the plurality of sources providing data to the second ingress buffer, processing circuitry examines source identifiers in packets held in the second ingress buffer and selects between the buffers so as to arbitrate between the sources. In some embodiments, the examination of the source identifiers provides statistics for a weighted round robin between the ingress buffers. In other embodiments, the source identifier of whichever packet is currently at the head of the second ingress buffer is used to perform a simple round robin between the sources.

Filter with Engineered Damping for Load-Balanced Fine-Grained Adaptive Routing in High-Performance System Interconnect
20220417155 · 2022-12-29 ·

A switch is provided for routing packets in an interconnection network. The switch includes a plurality of egress ports to transmit packets. The switch also includes one or more ingress ports to receive packets. The switch also includes a port and bandwidth capacity circuit configured to obtain (i) port capacity for a plurality of egress ports of the switch, and (ii) bandwidth capacity for transmitting packets to a destination. The switch also includes a network capacity circuit configured to compute network capacity, for transmitting packets to the destination, via the plurality of egress ports, based on a function of the port capacity and the bandwidth capacity. The switch also includes a routing circuit configured to route one or more packets received via one or more ingress ports of the switch, to the destination, via the plurality of egress ports, based on the network capacity.

Telemetry-Based Load-Balanced Fine-Grained Adaptive Routing in High-Performance System Interconnect
20220417163 · 2022-12-29 ·

A switch is provided for routing packets in an interconnection network. The switch includes egress ports to transmit packets, and ingress ports to receive packets. The switch also includes a buffer capacity circuit configured to obtain local buffer capacity for buffers configured to buffer packets transmitted via the switch. The switch also includes a telemetry circuit configured to receive telemetry flow control units from next switches coupled to the switch. Each telemetry flow control unit corresponds to buffer capacity at a respective next switch. The switch also includes a network capacity circuit configured to compute network capacity for transmitting packets to a destination based on the telemetry flow control units and the local buffer capacity. The switch also includes a routing circuit configured to receive packets via the ingress ports, and route the packets to the destination, via the egress ports, with bandwidth proportional to the network capacity.

Efficient Parallelized Computation of a BENES Network Configuration

A routing controller (30) includes an interface (68) and multiple processors (60) The interface is configured to receive a permutation (76) defining requested interconnections between N input ports and N output ports of a Benes network (24). The Benes network includes multiple 2-by-2 switches (42), and is reducible in a plurality of nested subnetworks associated with respective nesting levels, down to irreducible subnetworks including a single 2-by-2 switch. The multiple processors are configured to collectively determine a setting of the 2-by-2 switches that implements the received permutation, including determining sub-settings for two or more subnetworks of a given nesting level in parallel, and to configure the multiple 2-by-2 switches of the Benes network in accordance with the determined setting.

Distributed artificial intelligence extension modules for network switches
11516149 · 2022-11-29 · ·

Distributed machine learning systems and other distributed computing systems are improved by compute logic embedded in extension modules coupled directly to network switches. The compute logic performs collective actions, such as reduction operations, on gradients or other compute data processed by the nodes of the system. The reduction operations may include, for instance, summation, averaging, bitwise operations, and so forth. In this manner, the extension modules may take over some or all of the processing of the distributed system during the collective phase. An inline version of the module sits between a switch and the network. Data units carrying compute data are intercepted and processed using the compute logic, while other data units pass through the module transparently to or from the switch. Multiple modules may be connected to the switch, each coupled to a different group of nodes, and sharing intermediate results. A sidecar version is also described.