H04L49/104

RDMA Data Transmission System, RDMA Data Transmission Method, and Network Device
20240275740 · 2024-08-15 ·

A remote direct memory access (RDMA) data transmission system includes a first network device in a first host and a second network device in a second host. The first network device may create a shared send queue (SSQ) used by a plurality of processes run by the first host, obtain an RDMA data transmission message of a first process from the SSQ, and encapsulate a first identifier corresponding to the first process into a first packet in which the RDMA data transmission message is encapsulated. The second network device is configured to encapsulate the first identifier into a second packet in which a feedback message is encapsulated.

TECHNOLOGIES FOR OFFLOADING I/O INTENSIVE OPERATIONS TO A DATA STORAGE SLED

Technologies for offloading I/O intensive workload phases to a data storage sled include a compute sled. The compute sled is to execute a workload that includes multiple phases. Each phase is indicative of a different resource utilization over a time period. Additionally, the compute sled is to identify an I/O intensive phase of the workload, wherein the amount of data to be communicated through a network path between the compute sled and the data storage sled to execute the I/O intensive phase satisfies a predefined threshold. The compute sled is also to migrate the workload to the data storage sled to execute the I/O intensive phase locally on the data storage sled. Other embodiments as also described and claimed.

TECHNOLOGIES FOR DATA DEDUPLICATION IN DISAGGREGATED ARCHITECTURES

Technologies for providing data deduplication in a disaggregated architecture include a network device. The network device is to receive, from a compute sled, a request to write a data block to one or more data storage sleds and determine, for each of one or more data sub-blocks within the data block and from deduplication data indicative of physical addresses of data sub-blocks, whether each data sub-block is already stored in a data storage device of a data storage sled. Additionally, the network device is to write, in the deduplication data and in response to a determination that a data sub-block is already stored in a data storage device, a pointer to a physical address of the already-stored data sub-block in association with a logical address of the data sub-block.

TECHNOLOGIES FOR LIFECYCLE MANAGEMENT WITH REMOTE FIRMWARE
20180150293 · 2018-05-31 ·

Technologies for lifecycle management include multiple computing devices in communication with a lifecycle management server. On boot, a computing device loads a lightweight firmware boot environment. The lightweight firmware boot environment connects to the lifecycle management server and downloads one or more firmware images for controllers of the computing device. The controllers may include baseboard management controllers, network interface controllers, solid-state drive controllers, or other controllers. The lifecycle management server may select firmware images and/or versions of firmware images based on the controllers or the computing device. The computing device installs each firmware image to a controller memory device coupled to a controller, and in use, each controller accesses the firmware image in the controller memory device. The controller memory device may be a DRAM device or a high-performance byte-addressable non-volatile memory. Other embodiments are described and claimed.

TECHNOLOGIES FOR OFFLOADING ACCELERATION TASK SCHEDULING OPERATIONS TO ACCELERATOR SLEDS

Technologies for offloading acceleration task scheduling operations to accelerator sleds include a compute device to receive a request from a compute sled to accelerate the execution of a job, which includes a set of tasks. The compute device is also to analyze the request to generate metadata indicative of the tasks within the job, a type of acceleration associated with each task, and a data dependency between the tasks. Additionally the compute device is to send an availability request, including the metadata, to one or more micro-orchestrators of one or more accelerator sleds communicatively coupled to the compute device. The compute device is further to receive availability data from the one or more micro-orchestrators, indicative of which of the tasks the micro-orchestrator has accepted for acceleration on the associated accelerator sled. Additionally, the compute device is to assign the tasks to the one or more micro-orchestrators as a function of the availability data.

TECHNOLOGIES FOR DIVIDING WORK ACROSS ACCELERATOR DEVICES

Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.

TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES

A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.

TECHNOLOGIES FOR PROVIDING ACCELERATED FUNCTIONS AS A SERVICE IN A DISAGGREGATED ARCHITECTURE

Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed

TECHNOLOGIES FOR DYNAMICALLY MANAGING THE RELIABILITY OF DISAGGREGATED RESOURCES IN A MANAGED NODE

Technologies for dynamically managing the reliability of disaggregated resources in a managed node include a resource manager server. The resource manager server includes communication circuit to receive resource data from a set of disaggregated resources that indicates reliability of each disaggregated resource of the set of disaggregated resources and a node request to compose a managed node. The resource manager server further includes a compute engine to determine node parameters from the node request indicative of a target reliability of one or more disaggregated resources of the set of disaggregated resources to be included in the managed node, compose a managed node from the set of disaggregated resources that satisfies the node parameters by configuring the compute sled to utilize the disaggregated resources of the managed node for the execution of a workload, and monitor the disaggregated resources of the managed node for a failure.

TECHNOLOGIES FOR PROVIDING MANIFEST-BASED ASSET REPRESENTATION

Technologies for generating manifest data for a sled include a sled to generate manifest data indicative of one or more characteristics of the sled (e.g., hardware resources, firmware resources, a configuration of the sled, or a health of sled components). The sled is also to associate an identifier with the manifest data. The identifier uniquely identifies the sled from other sleds. Additionally, the sled is to send the manifest data and the associated identifier to a server. The sled may also detect a change in the hardware resources, firmware resources, the configuration, or component health of the sled. The sled may also generate an update of the manifest data based on the detected change, where the update specifies the detected change in the hardware resources, firmware resources, the configuration, or component health of the sled. The sled may also send the update of the manifest data to the server.