H04L49/3018

Control wavelet for accelerated deep learning

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.

Packet filtering using binary search trees
11770463 · 2023-09-26 · ·

A packet filtering system uses linked zero-based binary search trees to filter received packets. The binary search trees may be generated from filter conditions defining filter parameters for filtering packets.

Fabric vectors for deep learning acceleration

Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes various attributes of the fabric vector: length, microthreading eligibility, number of data elements to receive, transmit, and/or process in parallel, virtual channel and task identification information, whether to terminate upon receiving a control wavelet, and whether to mark an outgoing wavelet a control wavelet.

CONVERGED NETWORK INTERFACE CARD, MESSAGE CODING METHOD AND MESSAGE TRANSMISSION METHOD THEREOF

The invention provides a converged network interface card, a message coding method and a message transmission method thereof. The converged network interface card comprises a PCIE host interface processing module, a high speed network card core logic, a crossbar switch XBAR, an Ethernet network card core logic, an Ethernet message dicing/slicing module, a physical layer, a high speed network/Ethernet message conversion module EoH, and a high speed network/Ethernet configurable network port. The invention supports customized high speed interconnection interface and a standard Ethernet interface on a set of network hardware, and supports three working modes on a set of physical hardware (high speed network mode, Ethernet mode and EoH mode transmitting Ethernet messages over the high speed network), implements seamless compatibility between the high speed network/Ethernet, and flexibly supports multimode applications such as scientific computing and cloud computing.

Priority-based arbitration for parallel multicast routing with self-directed data packets
11184290 · 2021-11-23 · ·

A parallel multicast star topology data network includes a plurality of input buffers, a first arbitration mechanism coupled to the plurality of input buffers, a plurality of output buffers coupled to the first arbitration mechanism and a plurality of interconnect exits coupled to the plurality of output buffers. When packet contents of a multicast message are ready for release from the first arbitration mechanism then all of the packet contents are substantially simultaneously released to the plurality of output buffers and then substantially simultaneously to the plurality of interconnect exits.

Real-time data processing and storage apparatus
11178077 · 2021-11-16 · ·

A stream processor is disclosed, the stream processor includes: a first in first out memory FIFO, a calculation unit, and a cache. The FIFO receives current stream information, where the current stream information carries a target stream number and target data; when the FIFO receives a read valid signal, the FIFO sends the target stream number and the target data to the calculation unit, and sends the target stream number to the cache; the cache obtains, based on the target stream number, old data that corresponds to the target stream number, and sends the old data that corresponds to the target stream number to the calculation unit; and the calculation unit performs, based on the target data, calculation on the old data that corresponds to the target stream number to obtain new data, and sends the new data to the cache.

Data scheduling method and switching device

Embodiments of this application provide a method, includes: receiving a first data flow that includes a plurality of data units; inputting N.sub.1 data units of the plurality of data units and a first source marking unit into the first source queue; inputting M.sub.1 data units of the plurality of data units and a first target marking unit into the first target queue, wherein the N.sub.1 data units and the M.sub.1 data units are different data units; scheduling the N.sub.1 data units and the first source marking unit based on the first source marking unit and the first target marking unit; and scheduling the first target marking unit and the M.sub.1 data units, wherein the first target marking unit and the M.sub.1 data units are scheduled later than the N.sub.1 data units and the first source marking unit are scheduled.

System and method for facilitating efficient management of non-idempotent operations in a network interface controller (NIC)

A network interface controller (NIC) capable of efficient management of non-idempotent operations is provided. The NIC can be equipped with a network interface, storage management logic block, and an operation management logic block. During operation, the network interface can receive a request for an operation from a remote device. The storage management logic block can store, in a local data structure, outcome of operations executed by the NIC. The operation management logic block can determine whether the NIC has previously executed the operation. If the NIC has previously executed the operation, the operation management logic block can obtain an outcome of the operation from the data structure and generate a response comprising the obtained outcome for responding to the request.

Algorithms for use of load information from neighboring nodes in adaptive routing

Systems and methods are provided for passing data amongst a plurality of switches having a plurality of links attached between the plurality of switches. At a switch, a plurality of load signals are received from a plurality of neighboring switches. Each of the plurality of load signals are made up of a set of values indicative of a load at each of the plurality of neighboring switches providing the load signal. Each value within the set of values provides an indication for each link of the plurality of links attached thereto as to whether the link is busy or quiet. Based upon the plurality of load signals, an output link for routing a received packet is selected, and the received packet is routed via the selected output link.

USE OF STASHING BUFFERS TO IMPROVE THE EFFICIENCY OF CROSSBAR SWITCHES

A switch architecture enables ports to stash packets in unused buffers on other ports, exploiting excess internal bandwidth that may exist, for example, in a tiled switch. This architecture leverages unused port buffer memory to improve features such as congestion handling and error recovery.