H03K19/1731

Technologies for providing shared memory for accelerator sleds

Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.

Cloud-based scale-up system composition

Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.

CLOUD-BASED SCALE-UP SYSTEM COMPOSITION

Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.

Application-specific integrated circuit (ASIC) synthesis based on lookup table (LUT) mapping and optimization

A logic network for an integrated circuit is synthesized as follows. The logic network is mapped to a network of lookup tables (LUTs). The LUT mapping is based at least in part on estimated areas of the LUTs. The individual LUTs in the network are improved (LUT optimization), for example using various Boolean optimization techniques. The network of improved LUTs is then reduced to a gate-level netlist of standard cells.

OVERLAY ARCHITECTURE FOR PROGRAMMING FPGAS

An overlay architecture and an associated method that uses datapath merging to provide minimal-overhead support for multiple source netlists, and optionally provides an adjustable amount of flexibility through a secondary interconnect network is disclosed.

Apparatus for automatically configured interface and associated methods
09780789 · 2017-10-03 · ·

An integrated circuit (IC) includes a first circuit implemented using programmable circuitry of the IC, and a second circuit implemented using hardened circuitry of the IC. The IC further includes a configurable interface circuit to couple the first circuit to the second circuit using ready/valid signaling with a configurable ready-latency value.

Technologies for dividing work across accelerator devices

Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.

Technolgies for millimeter wave rack interconnects

Racks and rack pods to support a plurality of sleds are disclosed herein. Switches for use in the rack pods are also disclosed herein. A rack comprises a plurality of sleds and a plurality of electromagnetic waveguides. The plurality of sleds are vertically spaced from one another. The plurality of electromagnetic waveguides communicate data signals between the plurality of sleds.

TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES

A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.

TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES

A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.