Patent classifications
H03K19/1731
Technologies for monitoring node cluster health
Technologies for monitoring node cluster health include a plurality of managed nodes of anode cluster communicatively coupled across a data network to a resource manager server. The resource manager server is configured to receive health data, via an out-of-band network, from each of the managed nodes of the node cluster. The resource manager server is further configured to identify whether a managed node of the plurality of managed nodes has indicated a failure, determine a cause of the failure, and classify the failure as being one of a soft failure or a hard failure as a function of the received health data and the cause of the failure. Additionally, the resource manager server is configured to transmit a health state change event to each of the other managed nodes of the plurality of managed nodes of the node cluster. Other embodiments are described herein.
Technologies for dividing work across accelerator devices
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Cloud-based scale-up system composition
Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.
TECHNOLOGIES FOR PROVIDING ACCELERATED FUNCTIONS AS A SERVICE IN A DISAGGREGATED ARCHITECTURE
Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.
Technologies for coordinating disaggregated accelerator device resources
A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.
Technologies for offloading acceleration task scheduling operations to accelerator sleds
Technologies for offloading acceleration task scheduling operations to accelerator sleds include a compute device to receive a request from a compute sled to accelerate the execution of a job, which includes a set of tasks. The compute device is also to analyze the request to generate metadata indicative of the tasks within the job, a type of acceleration associated with each task, and a data dependency between the tasks. Additionally the compute device is to send an availability request, including the metadata, to one or more micro-orchestrators of one or more accelerator sleds communicatively coupled to the compute device. The compute device is further to receive availability data from the one or more micro-orchestrators, indicative of which of the tasks the micro-orchestrator has accepted for acceleration on the associated accelerator sled. Additionally, the compute device is to assign the tasks to the one or more micro-orchestrators as a function of the availability data.
TECHNOLOGIES FOR LIFECYCLE MANAGEMENT WITH REMOTE FIRMWARE
Technologies for lifecycle management include multiple computing devices in communication with a lifecycle management server. On boot, a computing device loads a lightweight firmware boot environment. The lightweight firmware boot environment connects to the lifecycle management server and downloads one or more firmware images for controllers of the computing device. The controllers may include baseboard management controllers, network interface controllers, solid-state drive controllers, or other controllers. The lifecycle management server may select firmware images and/or versions of firmware images based on the controllers or the computing device. The computing device installs each firmware image to a controller memory device coupled to a controller, and in use, each controller accesses the firmware image in the controller memory device. The controller memory device may be a DRAM device or a high-performance byte-addressable non-volatile memory. Other embodiments are described and claimed.
TECHNOLOGIES FOR PROVIDING SHARED MEMORY FOR ACCELERATOR SLEDS
Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.
Technologies for dynamically managing the reliability of disaggregated resources in a managed node
Technologies for dynamically managing the reliability of disaggregated resources in a managed node include a resource manager server. The resource manager server includes communication circuit to receive resource data from a set of disaggregated resources that indicates reliability of each disaggregated resource of the set of disaggregated resources and a node request to compose a managed node. The resource manager server further includes a compute engine to determine node parameters from the node request indicative of a target reliability of one or more disaggregated resources of the set of disaggregated resources to be included in the managed node, compose a managed node from the set of disaggregated resources that satisfies the node parameters by configuring the compute sled to utilize the disaggregated resources of the managed node for the execution of a workload, and monitor the disaggregated resources of the managed node for a failure.
Data connector with movable cover
A data connector to interface with a sled of a data center includes a main body, a plurality of guide shafts, and a cover. The main body includes electrical contacts. The guide shafts are associated with the main body, and each guide shaft extends along a corresponding longitudinal axis. The cover is coupled to the guide shafts such that the cover is slidable along the guide shafts in a direction defined by the longitudinal axes. The cover includes a movable door to provide protection to the electrical contacts of the main body when not in use.