G06F13/1652

TECHNOLOGIES FOR COORDINATING DISAGGREGATED ACCELERATOR DEVICE RESOURCES

A compute device to manage workflow to disaggregated computing resources is provided. The compute device comprises a compute engine receive a workload processing request, the workload processing request defined by at least one request parameter, determine at least one accelerator device capable of processing a workload in accordance with the at least one request parameter, transmit a workload to the at least one accelerator device, receive a work product produced by the at least one accelerator device from the workload, and provide the work product to an application.

Technologies for accelerator interface

Technologies for an accelerator interface over Ethernet are disclosed. In the illustrative embodiment, a network interface controller of a compute device may receive a data packet. If the network interface controller determines that the data packet should be pre-processed (e.g., decrypted) with a remote accelerator device, the network interface controller may encapsulate the data packet in an encapsulating network packet and send the encapsulating network packet to a remote accelerator device on a remote compute device. The remote accelerator device may pre-process the data packet (e.g., decrypt the data packet) and send it back to the network interface controller. The network interface controller may then send the pre-processed packet to a processor of the compute device.

System, Apparatus And Method For Synchronizing Multiple Virtual Link States Over A Package Interconnect
20230026906 · 2023-01-26 ·

In one embodiment, an apparatus includes an arbitration circuit with virtual link state machines to virtualize link states associated with multiple communication protocol stacks. The apparatus further includes a physical circuit coupled to the arbitration circuit and to interface with a physical link, where the physical circuit, in response to a retraining of the physical link, is to cause a plurality of the virtual link state machines to synchronize with corresponding virtual link state machines associated with a second side of the physical link, and where at least one of the communication protocol stacks is to remain in a low power state during the retraining and the synchronization. Other embodiments are described and claimed.

Mechanism to accelerate graphics workloads in a multi-core computing architecture
11798123 · 2023-10-24 · ·

A processing apparatus is described. The apparatus includes a plurality of processing cores, including a first processing core and a second processing core a first field programmable gate array (FPGA) coupled to the first processing core to accelerate execution of graphics workloads processed at the first processing core and a second FPGA coupled to the second processing core to accelerate execution of workloads processed at the second processing core.

Data transfer scheduling for hardware accelerator

A computing device, including a processor configured to perform data transfer scheduling for a hardware accelerator including a plurality of processing areas. Performing data transfer scheduling may include receiving a plurality of data transfer instructions that encode requests to transfer data to respective processing areas. Performing data transfer scheduling may further include identifying a plurality of transfer path conflicts between the data transfer instructions. Performing data transfer scheduling may further include sorting the data transfer instructions into a plurality of transfer instruction subsets. Within each transfer instruction subset, none of the data transfer instructions have transfer path conflicts. For each transfer instruction subset, performing data transfer scheduling may further include conveying the data transfer instructions included in that transfer instruction subset to the hardware accelerator. The data transfer instructions may be conveyed in a plurality of sequential data transfer phases that correspond to the transfer instruction subsets.

Graphics processing device

Disclosed is a graphics processing device including a main SoC, a performance-enhancing SoC, and an external circuit that is set outside any of the two SoCs. The main SoC includes: a first graphics processing unit (GPU) dividing to-be-processed data into a first input part and a second input part, and processing the first output part to generate first output data; and a first transceiver circuit forwarding the second input part to the performance-enhancing SoC via the external circuit, and then receiving second output data via the external circuit and forwarding it. The performance-enhancing SoC includes: a second transceiver circuit receiving the second input part via the external circuit and outputting the second output data to the main SoC via the external circuit; and a second GPU receiving the second input part from the second transceiver circuit and processing this part to provide the second output data for the second transceiver.

Techniques for an efficient fabric attached memory

Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.

Host connected computer network
11520726 · 2022-12-06 · ·

A processor comprises a plurality of processing units on an integrated circuit interconnected by an exchange. The exchange has a group of exchange paths extending between first and second portions of the integrated circuit. Each group has a first exchange block in the first portion and a second exchange block in the second portion. The processor has a first external interface in the first portion a second external interface in the second portion and a routing bus which routes packets between the external interfaces and the exchange blocks. The first external interface exchanges packets between the integrated circuit and a host. The second interface exchanges packets between the integrated circuit and another integrated circuit. Errors may be trapped when packets are wrongly addressed. A network of such processors is also provided.

TECHNOLOGIES FOR PROVIDING MANIFEST-BASED ASSET REPRESENTATION

Technologies for generating manifest data for a sled include a sled to generate manifest data indicative of one or more characteristics of the sled (e.g., hardware resources, firmware resources, a configuration of the sled, or a health of sled components). The sled is also to associate an identifier with the manifest data. The identifier uniquely identifies the sled from other sleds. Additionally, the sled is to send the manifest data and the associated identifier to a server. The sled may also detect a change in the hardware resources, firmware resources, the configuration, or component health of the sled. The sled may also generate an update of the manifest data based on the detected change, where the update specifies the detected change in the hardware resources, firmware resources, the configuration, or component health of the sled. The sled may also send the update of the manifest data to the server.

Mechanism to accelerate graphics workloads in a multi-core computing architecture
11443405 · 2022-09-13 · ·

A processing apparatus is described. The apparatus includes a plurality of processing cores, including a first processing core and a second processing core a first field programmable gate array (FPGA) coupled to the first processing core to accelerate execution of graphics workloads processed at the first processing core and a second FPGA coupled to the second processing core to accelerate execution of workloads processed at the second processing core.