Patent classifications
G06F13/28
SYSTEMS, DEVICES AND METHODS WITH OFFLOAD PROCESSING DEVICES
A method can include receiving network packets including forwarding plane packets; evaluating header information of the network packets to map network packets to any of a plurality of destinations on the module, each destination corresponding to any of a plurality of services executed by offload processors of the module; configuring operations of the offload processors; and in response to forwarding plane packets, executing operations on the forwarding plane packets; wherein the receiving, evaluation and processing of the forwarding plane packets are performed independent of the host processor. Corresponding systems and methods are also disclosed.
SYSTEMS, DEVICES AND METHODS WITH OFFLOAD PROCESSING DEVICES
A method can include receiving network packets including forwarding plane packets; evaluating header information of the network packets to map network packets to any of a plurality of destinations on the module, each destination corresponding to any of a plurality of services executed by offload processors of the module; configuring operations of the offload processors; and in response to forwarding plane packets, executing operations on the forwarding plane packets; wherein the receiving, evaluation and processing of the forwarding plane packets are performed independent of the host processor. Corresponding systems and methods are also disclosed.
COMPUTATIONAL MEMORY
An example device includes a plurality of computational memory banks. Each computational memory bank of the plurality of computational memory banks includes an array of memory units and a plurality of processing elements connected to the array of memory units. The device further includes a plurality of single instruction, multiple data (SIMD) controllers. Each SIMD controller of the plurality of SIMD controllers is contained within at least one computational memory bank of the plurality of computational memory banks. Each SIMD controller is to provide instructions to the at least one computational memory bank.
IOMMU-BASED DIRECT MEMORY ACCESS (DMA) TRACKING FOR ENABLING LIVE MIGRATION OF VIRTUAL MACHINES (VMS) USING PASSTHROUGH PHYSICAL DEVICES
Techniques for implementing IOMMU-based DMA tracking for enabling live migration of VMs that use passthrough physical devices are provided. In one set of embodiments, these techniques leverage an IOMMU feature known as dirty bit tracking which is available in most, if not all, modern IOMMU implementations. The use of this feature allows for the tracking of passthrough DMA in a manner that is device/vendor/driver agnostic, resulting in a solution that is universally applicable to all passthrough physical devices.
IOMMU-BASED DIRECT MEMORY ACCESS (DMA) TRACKING FOR ENABLING LIVE MIGRATION OF VIRTUAL MACHINES (VMS) USING PASSTHROUGH PHYSICAL DEVICES
Techniques for implementing IOMMU-based DMA tracking for enabling live migration of VMs that use passthrough physical devices are provided. In one set of embodiments, these techniques leverage an IOMMU feature known as dirty bit tracking which is available in most, if not all, modern IOMMU implementations. The use of this feature allows for the tracking of passthrough DMA in a manner that is device/vendor/driver agnostic, resulting in a solution that is universally applicable to all passthrough physical devices.
Transposing Memory Layout of Weights in Deep Neural Networks (DNNs)
A compute block includes a DMA engine that reads data from an external memory and write the data into a local memory of the compute block. An MAC array in the compute block may use the data to perform convolutions. The external memory may store weights of one or more filters in a memory layout that comprises a sequence of sections for each filter. Each section may correspond to a channel of the filter and may store all the weights in the channel. The DMA engine may convert the memory layout to a different memory layout, which includes a sequence of new sections for each filter. Each new section may include a weight vector that includes a sequence of weights, each of which is from a different channel. The DMA engine may also compress the weights, e.g., by removing zero valued weights, before the conversion of the memory layout.
Transposing Memory Layout of Weights in Deep Neural Networks (DNNs)
A compute block includes a DMA engine that reads data from an external memory and write the data into a local memory of the compute block. An MAC array in the compute block may use the data to perform convolutions. The external memory may store weights of one or more filters in a memory layout that comprises a sequence of sections for each filter. Each section may correspond to a channel of the filter and may store all the weights in the channel. The DMA engine may convert the memory layout to a different memory layout, which includes a sequence of new sections for each filter. Each new section may include a weight vector that includes a sequence of weights, each of which is from a different channel. The DMA engine may also compress the weights, e.g., by removing zero valued weights, before the conversion of the memory layout.
Data output method, data acquisition method, device, and electronic apparatus
A data output method, a data acquisition method, a device, and an electronic apparatus are provided, and a specific technical solution is: reading a first data sub-block, and splicing the first data sub-block into a continuous data stream, wherein the first data sub-block is a data sub-block in transferred data in a neural network; compressing the continuous data stream to acquire a second data sub-block; determining, according to a length of the first data sub-block and a length of the second data sub-block, whether there is a gain in compression of the continuous data stream; outputting the second data sub-block if there is the gain in the compression of the continuous data stream.
Method and system for processing network packets
The packet processing system, according to an example embodiment, comprises a Network Interface Controller (NIC) to receive and transmit network packets; a memory unit for storing network packets; a processor for processing network packets stored in the memory unit; a cache unit to access all data to the processor from the memory unit; and an application process running on the processing unit. The NIC includes a packet processing means to process the network packets received by the NIC. The packet processing means includes a Contiguous Header Mapping/Map (CHM) header-data splitter to split said network packets into a header portion and a payload portion; a table or equivalent to store the contiguous header-data split configuration data; and a packet Direct Memory Access (DMA) unit to DMA copy said header portion and said payload portion into separate memory area/location and contiguously map said header portion of network packets in the memory unit.
Method and system for processing network packets
The packet processing system, according to an example embodiment, comprises a Network Interface Controller (NIC) to receive and transmit network packets; a memory unit for storing network packets; a processor for processing network packets stored in the memory unit; a cache unit to access all data to the processor from the memory unit; and an application process running on the processing unit. The NIC includes a packet processing means to process the network packets received by the NIC. The packet processing means includes a Contiguous Header Mapping/Map (CHM) header-data splitter to split said network packets into a header portion and a payload portion; a table or equivalent to store the contiguous header-data split configuration data; and a packet Direct Memory Access (DMA) unit to DMA copy said header portion and said payload portion into separate memory area/location and contiguously map said header portion of network packets in the memory unit.