Patent classifications
G06F15/17331
METHOD FOR DATA SYNCHRONIZATION BETWEEN HOST SIDE AND FPGA ACCELERATOR
Disclosed are a method for data synchronization between a host side and a Field Programmable Gate Array (FPGA) accelerator, a Bidirectional Memory Synchronize Engine (DMSE), a FPGA accelerator, and a data synchronization system. The method includes: in response to detection of data migration from a host side to a preset memory space, generating second state information according to first state information in a first address space, and writing the second state information to a second address space (S201); and in response to detection of the second state information in the second address space, calling Direct Memory Access (DMA) to migrate data in the preset memory space to a memory space of a FPGA accelerator, and copying the second state information to the first address space, so as to implement synchronization (S202).
Zero copy host interface in a scalable input/output (I/O) virtualization (S-IOV) architecture
Examples may include a computing platform having a host driver to get a packet descriptor of a received packet stored in a receive queue and to modify the packet descriptor from a first format to a second format. The computing platform also includes a guest virtual machine including a guest driver coupled to the host driver, the guest driver to receive the modified packet descriptor and to read a packet buffer stored in the receive queue using the modified packet descriptor, the packet buffer corresponding to the packet descriptor.
Method, apparatus, and data processing system including controller to manage storage nodes and host operations
A data processing system and method, and a corresponding apparatus, where the data processing system includes a controller and at least two storage nodes. The controller is configured to receive, using a first coupling between the controller and a host, an operation request received from the host, where the operation request includes an identity of target data and an operation type, determine at least one target storage node from the at least two storage nodes according to the identity of the target data, and send an instruction message to the at least one target storage node using a second coupling to the at least one target storage node, where the at least one target storage node is configured to send the target data to the host or obtain the target data from the host according to the instruction message.
Parameter server and method for sharing distributed deep learning parameter using the same
Disclosed herein are a parameter server and a method for sharing distributed deep-learning parameters using the parameter server. The method for sharing distributed deep-learning parameters using the parameter server includes initializing a global weight parameter in response to an initialization request by a master process; performing an update by receiving a learned local gradient parameter from the worker process, which performs deep-learning training after updating a local weight parameter using the global weight parameter; accumulating the gradient parameters in response to a request by the master process; and performing an update by receiving the global weight parameter from the master process that calculates the global weight parameter using the accumulated gradient parameters of the one or more worker processes.
DATA TRAFFIC PRIORITIZATION BASED ON CONTENT
Described are techniques including a computer-implemented method that comprises defining a respective priority classification for each of a plurality of sockets used for communicating between an initiator computational system and a target computational system. The method further comprises automatically assigning a respective priority classification to each of a plurality of Input/Output (IO) requests based on a type of data associated with each IO request. The method further comprises sending the plurality of IO requests to respective sockets of the plurality of sockets with a matching priority classification.
Devices, Methods, and System for Reducing Latency in Remote Direct Memory Access System
A sending device is configured to generate a first message that includes a first indication of a first operation type; transmit the first message to a receiving device over the communications interface; generate a second message that includes a second indication of a second operation type; determine whether the second operation type is associated with the first operation type; determine, in response to determining that the second operation type is associated with the first operation type, that the local pacing timer has exceeded a timer duration since transmitting the first message; and transmit, in response to determining that the local pacing timer has exceeded the timer duration since transmitting the first message, the second message to the receiving device over the communications interface.
Distributed data store with persistent memory
A method to build a persistent memory (PM)-based data storage system without involving a processor (CPU) at storage nodes is disclosed which includes storing data in one or more storage nodes that only include PM and no CPUs, with data stored in PM in form of link lists, accessing data stored in the one or more storage nodes' PM directly by remote compute nodes through a network, maintaining metadata associated with the data by one or more global controllers (metadata servers), upon request by a user to read or write data, the compute nodes contacting the one or more metadata servers to obtain location of data of interest in form of pointers (shortcuts), and the compute nodes sending network requests directly to the one or more storage nodes' PM to locate latest version of data by tracing the link list from the associated shortcut to corresponding tails.
NVMEoF Flow Control from Initiator Based on Transaction Latency
A storage array that uses NVMEoF to interconnect compute nodes with NVME SSDs via a fabric and NVME offload engines implements flow control based on transaction latency. Transaction latency is the elapsed time between the send side completion message and receive side completion message for a single transaction. Counts of total transactions and over-latency-limit transactions are accumulated over a time interval. If the over limit rate exceeds a threshold, then the maximum allowed number of enqueued pending transactions is reduced. The maximum allowed number of enqueued pending transactions is periodically restored to a default value.
DATA PROCESSING ENGINE ARRANGEMENT IN A DEVICE
A device includes a data processing engine array having a plurality of data processing engines organized in a grid having a plurality of rows and a plurality of columns. Each data processing engine includes a core, a memory module including a memory and a direct memory access engine. Each data processing engine includes a stream switch connected to the core, the direct memory access engine, and the stream switch of one or more adjacent data processing engines. Each memory module includes a first memory interface directly coupled to the core in the same data processing engine and one or more second memory interfaces directly coupled to the core of each of the one or more adjacent data processing engines.
EFFICIENT USAGE OF ONE-SIDED RDMA FOR LINEAR PROBING
Systems and methods for reducing latency of probing operations of remotely located linear hash tables are described herein. In an embodiment, a system receives a request to perform a probing operation on a remotely located linear hash table based on a key value. Prior to performing the probing operation, the system dynamically predicts a number of slots for a single read of the linear hash table to minimize total cost for an average probing operation. The system determines a hash value based on the key value and determines a slot of the linear hash table to which the hash value corresponds. After predicting the number of slots, the system issues an RDMA request to perform a read of the predicted number of slots from the linear hash table starting at the slot to which the hash value corresponds.