H04L49/1507

Multicast network and memory transfer optimizations for neural network hardware acceleration
11704548 · 2023-07-18 · ·

In one embodiment, a system to deterministically transfer partitions of contiguous computer readable data in constant time includes a computer readable memory and a modulo address generator. The computer readable memory is organized into D banks, to contain contiguous data including a plurality of data elements of size M which are constituent data elements of a vector with N data elements, the data elements to start at an offset address O. The modulo address generator is to generate the addresses of the data elements of a vector with i data elements stored in the computer readable memory, the modulo address generator including at least one forward permutaton to permute data elements with addresses of the form O+M*i where 0<=i<N. Other embodiments are described and claimed.

Server, server system, and method of increasing network bandwidth of server

A server includes a normal NIC as an NIC having an expansion function, and a virtual patch panel having a transfer function of transferring packets between the normal NIC and an accelerator utilization type NIC, which is implemented by software. The server is configured such that, when a packet is transferred between the normal NIC and the accelerator utilization type NIC via the virtual patch panel, the target function transfers the packet to and from the APLs.

Network interconnect as a switch
11509538 · 2022-11-22 · ·

An interconnect as a switch module (“ICAS” module) comprising n port groups, each port group comprising n−1 interfaces, and an interconnecting network implementing a full mesh topology where each port group comprising a plurality of interfaces each connects an interface of one of the other port groups, respectively. The ICAS module may be optically or electrically implemented. According to the embodiments, the ICAS module may be used to construct a stackable switching device and a multi-unit switching device, to replace a data center fabric switch, and to build a new, high-efficient, and cost-effective data center.

NETWORK INTERCONNECT AS A SWITCH
20230052529 · 2023-02-16 ·

An interconnect as a switch module (“ICAS” module) comprising n port groups, each port group comprising n-1 interfaces, and an interconnecting network implementing a full mesh topology where each port group comprising a plurality of interfaces each connects an interface of one of the other port groups, respectively. The ICAS module may be optically or electrically implemented. According to the embodiments, the ICAS module may be used to construct a stackable switching device and a multi-unit switching device, to replace a data center fabric switch, and to build a new, high-efficient, and cost-effective data center.

Network interconnect as a switch
11671330 · 2023-06-06 · ·

An interconnect as a switch module (“ICAS” module) comprising n port groups, each port group comprising n-1 interfaces, and an interconnecting network implementing a full mesh topology where each port group comprising a plurality of interfaces each connects an interface of one of the other port groups, respectively. The ICAS module may be optically or electrically implemented. According to the embodiments, the ICAS module may be used to construct a stackable switching device and a multi-unit switching device, to replace a data center fabric switch, and to build a new, high-efficient, and cost-effective data center.

MULTICAST NETWORK AND MEMORY TRANSFER OPTIMIZATIONS FOR NEURAL NETWORK HARDWARE ACCELERATION
20170337468 · 2017-11-23 ·

Neural network specific hardware acceleration optimizations are disclosed, including an optimized multicast network and an optimized DRAM transfer unit to perform in constant or linear time. The multicast network is a set of switch nodes organized into layers and configured to operate as a Bene{hacek over (s)} network. Configuration data may be accessed by all switch nodes in the network. Each layer is configured to perform a Bene{hacek over (s)} network transformation of the -previous layer within a computer instruction. Since the computer instructions are pipelined, the entire network of switch nodes may be configured in constant or linear time. Similarly a DRAM transfer unit configured to access memory in strides organizes memory into banks indexed by prime or relatively prime number amounts. The index value is selected as not to cause memory address collisions. Upon receiving a memory specification, the DRAM transfer unit may calculate out strides thereby accessing an entire tile of a tensor in constant or linear time.

Speculative resource allocation for routing on interconnect fabrics
11245643 · 2022-02-08 · ·

Methods and systems related to speculative resource allocation for routing on an interconnect fabric are disclosed herein. One disclosed method includes speculatively allocating a collection of resources to support a set of paths through an interconnect fabric. The method also includes aggregating a set of responses from the set of paths at a branch node on the set of paths. If a resource contention is detected, the set of responses will include an indicator of a resource contention. The method will then further include transmitting, from the branch node and in response to the indicator of the resource contention, a deallocate message downstream and the indicator of the resource contention upstream, and reallocating resources for the multicast after a hold period.

Switch and setting method
09742698 · 2017-08-22 · ·

A disclosed switch includes: plural ports each of which is connected to another apparatus; a determination unit that determines, for each of the plural ports, whether the port is connected to one of plural switches integrated logically; and a setting unit that sets, for each of the plural ports, a port type or propriety of use based on a result of determination by the determination unit.

Automated multi-fabric link aggregation system

An automated multi-fabric link aggregation system includes leaf switch devices that have leaf switch device downlink ports, that are included in a first network fabric, and that are aggregated to provide a first aggregation fabric. Each leaf switch device generates discovery communications including a first network fabric identifier for the first network fabric, and a first aggregation fabric identifier for the first aggregation fabric. The leaf switch devices then transmit the discovery communications via the leaf switch device downlink ports. I/O modules that have I/O module uplink port are included in a second network fabric and are aggregated to provide a second aggregation fabric. The I/O modules receive the discovery communications via each of the I/O module uplink ports, determine that each received discovery communication includes the first network fabric identifier and the first aggregation fabric identifier and, in response, automatically configure the I/O module uplink ports in a LAG.

MULTICAST NETWORK AND MEMORY TRANSFER OPTIMIZATIONS FOR NEURAL NETWORK HARDWARE ACCELERATION
20210374512 · 2021-12-02 ·

In one embodiment, a system to deterministically transfer partitions of contiguous computer readable data in constant time includes a computer readable memory and a modulo address generator. The computer readable memory is organized into D banks, to contain contiguous data including a plurality of data elements of size M which are constituent data elements of a vector with N data elements, the data elements to start at an offset address O. The modulo address generator is to generate the addresses of the data elements of a vector with i data elements stored in the computer readable memory, the modulo address generator including at least one forward permutaton to permute data elements with addresses of the form O+M*i where 0<=i<N. Other embodiments are described and claimed