Patent classifications
H04L49/3045
Backpressure for Accelerated Deep Learning
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element comprises a respective compute element and a respective routing element. Each compute element comprises virtual input queues. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. Each router comprises data queues. The virtual input queues of the compute element and the data queues of the router are managed in accordance with the virtual channels. Backpressure information, per each of the virtual channels, is generated, communicated, and used to prevent overrun of the virtual input queues and the data queues.
Queue management method and apparatus
A queue management method and apparatus are disclosed. The queue management method includes: storing a first packet to a first buffer cell included in a first macrocell, where the first macrocell is enqueued to a first entity queue, the first macrocell includes N consecutive buffer cells, and the first buffer cell belongs to the N buffer cells; correcting, based on a packet length of the first packet, an average packet length in the first macrocell that is obtained before the first packet is stored, to obtain a current average packet length in the first macrocell; and generating, based on the first macrocell and the first entity queue, queue information corresponding to the first macrocell of the first macrocell in the first entity queue, a head pointer in the first macrocell, a tail pointer in the first macrocell, and the current average packet length in the first macrocell.
PACKET PROCESSING METHOD AND RELATED DEVICE
A packet processing method and device are provided, to save CPU resources consumed by parsing a packet. The method includes: parsing, by an intelligent network interface card, a received first packet to obtain an identifier of the first packet; updating, by the intelligent network interface card, a control field of a first memory buffer based on the identifier of the first packet; storing, by the intelligent network interface card, a payload of the first packet or a packet header and a payload of the first packet into the first address space through DMA based on an aggregation position of the first packet; aggregating, by a host, the first address information and at least one piece of second address information based on an updated control field in the first mbuf; and reading, by a virtual machine, address information, to obtain data in an address space indicated by the address information.
Network traffic routing in distributed computing systems
Distributed computing systems, devices, and associated methods of packet routing are disclosed herein. In one embodiment, a method includes receiving, from a computing network, a packet at a packet processor of a server. The method also includes matching the received packet with a flow in a flow table contained in the packet processor and determining whether the action indicates that the received packet is to be forwarded to a NIC buffer in the outbound processing path of the packet processor instead of the NIC. The method further includes in response to determining that the action indicates that the received packet is to be forwarded to the NIC buffer, forwarding the received packet to the NIC buffer and processing the packet in the NIC buffer to forward the packet to the computer network without exposing the packet to the main processor.
MAINTAINING BANDWIDTH UTILIZATION IN THE PRESENCE OF PACKET DROPS
Examples describe a manner of scheduling packet segment fetches at a rate that is based on one or more of: a packet drop indication, packet drop rate, incast level, operation of queues in SAF or VCT mode, or fabric congestion level. Headers of packets can be fetched faster than payload or body portions of packets and processed prior to queueing of all body portions. In the event a header is identified as droppable, fetching of the associated body portions can be halted and any body portion that is queued can be discarded. Fetch overspeed can be applied for packet headers or body portions associated with packet headers that are approved for egress.
METHODS, SYSTEMS AND APPRATUSES FOR OPTIMIZING TIME-TRIGGERED ETHERNET (TTE) NETWORK SCHEDULING BY USING A DIRECTIONAL SEARCH FOR BIN SELECTION
Methods, systems and apparatuses for scheduling a plurality of Virtual Links (VLs) in a Time-Triggered Ethernet (TTE) network by a network scheduling and configuration tool (NST) by establishing a collection of bins that corresponds to the smallest harmonic period allowing full network traversal of a time-triggered traffic packet in the network for determining available bin sets for sending the VL data by the NST; processing by a scheduling algorithm the VLs to be sent in accordance with a strict order comprising scheduling all the highest rate VLs prior to scheduling lower rate VLs; and scheduling reservations for the VLs in bins by tracking the available time available in each bin and optionally spreading the VL data across available bin sets by sorting a list of available bins by ascending bin utilization and by specifying a left-to-right or right-to-left sort order when searching for available bins based on a position in the timeline between the transmitter and receiver end stations.
Managing virtual output queues
A first node of a packet switched network transmits at least one flow of protocol data units of a network to at least one output context of one of a plurality of second nodes of the network. The first node includes X virtual output queues (VOQs). The first node receives, from at least one of the second nodes, at least one fair rate record. Each fair rate record corresponds to a particular second node output context and describes a recommended rate of flow to the particular output context. The first node allocates up to X of the VOQs among flows corresponding to i) currently allocated VOQs, and ii) the flows corresponding to the received fair rate records. The first node operates each allocated VOQ according to the corresponding recommended rate of flow until a deallocation condition obtains for the each allocated VOQ.
Methods, systems and apparatuses for optimizing time-triggered ethernet (TTE) network scheduling by using a directional search for bin selection
Methods, systems and apparatuses for scheduling a plurality of Virtual Links (VLs) in a Time-Triggered Ethernet (TTE) network by a network scheduling and configuration tool (NST) by establishing a collection of bins that corresponds to the smallest harmonic period allowing full network traversal of a time-triggered traffic packet in the network for determining available bin sets for sending the VL data by the NST; processing by a scheduling algorithm the VLs to be sent in accordance with a strict order comprising scheduling all the highest rate VLs prior to scheduling lower rate VLs; and scheduling reservations for the VLs in bins by tracking the available time available in each bin and optionally spreading the VL data across available bin sets by sorting a list of available bins by ascending bin utilization and by specifying a left-to-right or right-to-left sort order when searching for available bins based on a position in the timeline between the transmitter and receiver end stations.
Allocation of virtual queues of a network forwarding element
In a method for allocating physical queues of a network forwarding element, a request is received at the network forwarding element, the network forwarding element including a plurality of physical queues, where each physical queue of the plurality of physical queues has a fixed bandwidth, the request identifying an allocation of a plurality of virtual queues at the network forwarding element. Based at least in part on the request, a configuration of the plurality of physical queues to the plurality of virtual queues is determined. The plurality of physical queues is configured according to the configuration, wherein the configuring includes allocating at least two physical queues to a virtual queue.
Task activating for accelerated deep learning
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by virtual channel specifiers in each wavelet and routing configuration information in each router. Execution of an activate instruction or completion of a fabric vector operation activates one of the virtual channels. A virtual channel is selected from a pool comprising previously activated virtual channels and virtual channels associated with previously received wavelets. A task corresponding to the selected virtual channel is activated by executing instructions corresponding to the selected virtual channel.