Patent classifications
H04L49/104
Technologies for providing shared memory for accelerator sleds
Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.
Technologies for providing shared memory for accelerator sleds
Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.
Queue protection using a shared global memory reserve
The subject technology relates to the management of a shared buffer memory in a network switch. Systems, methods, and machine readable media are provided for receiving a data packet at a first network queue from among a plurality of network queues, determining if a fill level of a queue in a shared buffer of the network switch exceeds a dynamic queue threshold, and in an event that the fill level of the shared buffer exceeds the dynamic queue threshold, determining if a fill level of the first network queue is less than a static queue minimum threshold.
PROVIDING FRAMES AT A NETWORK PORT
Examples disclosed herein relate generally to providing frames at a network port. Some examples relate to an apparatus. The apparatus may include a network port to directly couple the apparatus to a switch. The apparatus may also include a transmission logic to provide a frame at the network port at a time slot of a switch-associated cycle time. The transmission logic may provide the frame at the time slot at least partially responsive to an offset value and a port number. Related devices, systems and methods are also disclosed.
PROVIDING FRAMES AT A NETWORK PORT
Examples disclosed herein relate generally to providing frames at a network port. Some examples relate to an apparatus. The apparatus may include a network port to directly couple the apparatus to a switch. The apparatus may also include a transmission logic to provide a frame at the network port at a time slot of a switch-associated cycle time. The transmission logic may provide the frame at the time slot at least partially responsive to an offset value and a port number. Related devices, systems and methods are also disclosed.
FPGA-efficient directional two-dimensional router
A configurable directional 2D router for Networks on Chips (NOCs) is disclosed. The router, which may be bufferless, is designed for implementation in programmable logic in FPGAs, and achieves theoretical lower bounds on FPGA resource consumption for various applications. The router employs an FPGA router switch design that consumes only one 6-LUT or 8-input ALM logic cell per router per bit of router link width. A NOC comprising a plurality of routers may be configured as a directional 2D torus, or in diverse ways, network sizes and topologies, data widths, routing functions, performance-energy tradeoffs, and other options. The router and NOC enable feasible FPGA implementation of large integrated systems on chips, interconnecting hundreds of client cores over high bandwidth links, including compute and accelerator cores, industry standard IP cores, DRAM/HBM/HMC channels, PCI Express channels, and 10G/25G/40G/100G/400G networks.
CLOUD-BASED SCALE-UP SYSTEM COMPOSITION
Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.
Technologies for dividing work across accelerator devices
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Technologies for dividing work across accelerator devices
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Technolgies for millimeter wave rack interconnects
Racks and rack pods to support a plurality of sleds are disclosed herein. Switches for use in the rack pods are also disclosed herein. A rack comprises a plurality of sleds and a plurality of electromagnetic waveguides. The plurality of sleds are vertically spaced from one another. The plurality of electromagnetic waveguides communicate data signals between the plurality of sleds.