G06F13/1652

Technologies for providing shared memory for accelerator sleds

Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.

DATA PROCESSING SYSTEM HAVING MASTERS THAT ADAPT TO AGENTS WITH DIFFERING RETRY BEHAVIORS

A data processing system includes a plurality of snoopers, a processing unit including master, and a system fabric communicatively coupling the master and the plurality of snoopers. The master sets a retry operating mode for an interconnect operation in one of alternative first and second operating modes. The first operating mode is associated with a first type of snooper, and the second operating mode is associated with a different second type of snooper. The master issues a memory access request of the interconnect operation on the system fabric of the data processing system. Based on receipt of a combined response representing a systemwide coherence response to the request, the master delays an interval having a duration dependent on the retry operating mode and thereafter reissues the memory access request on the system fabric.

Data transfer scheduling for hardware accelerator

A computing device, including a processor configured to perform data transfer scheduling for a hardware accelerator including a plurality of processing areas. Performing data transfer scheduling may include receiving a plurality of data transfer instructions that encode requests to transfer data to respective processing areas. Performing data transfer scheduling may further include identifying a plurality of transfer path conflicts between the data transfer instructions. Performing data transfer scheduling may further include sorting the data transfer instructions into a plurality of transfer instruction subsets. Within each transfer instruction subset, none of the data transfer instructions have transfer path conflicts. For each transfer instruction subset, performing data transfer scheduling may further include conveying the data transfer instructions included in that transfer instruction subset to the hardware accelerator. The data transfer instructions may be conveyed in a plurality of sequential data transfer phases that correspond to the transfer instruction subsets.

SYSTEMS, DEVICES AND METHODS WITH OFFLOAD PROCESSING DEVICES
20230231811 · 2023-07-20 ·

A method can include receiving network packets including forwarding plane packets; evaluating header information of the network packets to map network packets to any of a plurality of destinations on the module, each destination corresponding to any of a plurality of services executed by offload processors of the module; configuring operations of the offload processors; and in response to forwarding plane packets, executing operations on the forwarding plane packets; wherein the receiving, evaluation and processing of the forwarding plane packets are performed independent of the host processor. Corresponding systems and methods are also disclosed.

Host controller interface using multiple circular queue, and operating method thereof

A host controller interface configured to provide interfacing between a host device and a storage device includes processing circuitry; a doorbell register configured to store a head pointer and a tail pointer of one or more first queues; and an entry buffer configured to store a first command from one of the one or more first queues in the entry buffer, wherein the processing circuitry is configured to, determine an order in which the commands of the one or more first queues are to be processed, route the first command to be stored in the entry buffer according to the determined order, and route a first response to be stored in one of one or more second queues.

Data processing system having masters that adapt to agents with differing retry behaviors

A data processing system includes a plurality of snoopers, a processing unit including master, and a system fabric communicatively coupling the master and the plurality of snoopers. The master sets a retry operating mode for an interconnect operation in one of alternative first and second operating modes. The first operating mode is associated with a first type of snooper, and the second operating mode is associated with a different second type of snooper. The master issues a memory access request of the interconnect operation on the system fabric of the data processing system. Based on receipt of a combined response representing a systemwide coherence response to the request, the master delays an interval having a duration dependent on the retry operating mode and thereafter reissues the memory access request on the system fabric.

Lookahead priority collection to support priority elevation

A queuing requester for access to a memory system is provided. Transaction requests are received from two or more requestors for access to the memory system. Each transaction request includes an associated priority value. A request queue of the received transaction requests is formed in the queuing requester. Each transaction request includes an associated priority value. A highest priority value of all pending transaction requests within the request queue is determined. An elevated priority value is selected when the highest priority value is higher than the priority value of an oldest transaction request in the request queue; otherwise the priority value of the oldest transaction request is selected. The oldest transaction request in the request queue with the selected priority value is then provided to the memory system. An arbitration contest with other requesters for access to the memory system is performed using the selected priority value.

Non-posted write transactions for a computer bus

Systems and devices can include a controller and a command queue to buffer incoming write requests into the device. The controller can receive, from a client across a link, a non-posted write request (e.g., a deferred memory write (DMWr) request) in a transaction layer packet (TLP) to the command queue; determine that the command queue can accept the DMWr request; identify, from the TLP, a successful completion (SC) message that indicates that the DMWr request was accepted into the command queue; and transmit, to the client across the link, the SC message that indicates that the DMWr request was accepted into the command queue. The controller can receive a second DMWr request in a second TLP; determine that the command queue is full; and transmit a memory request retry status (MRS) message to be transmitted to the client in response to the command queue being full.

NON-POSTED WRITE TRANSACTIONS FOR A COMPUTER BUS

Systems and devices can include a controller and a command queue to buffer incoming write requests into the device. The controller can receive, from a client across a link, a non-posted write request (e.g., a deferred memory write (DMWr) request) in a transaction layer packet (TLP) to the command queue; determine that the command queue can accept the DMWr request; identify, from the TLP, a successful completion (SC) message that indicates that the DMWr request was accepted into the command queue; and transmit, to the client across the link, the SC message that indicates that the DMWr request was accepted into the command queue. The controller can receive a second DMWr request in a second TLP; determine that the command queue is full; and transmit a memory request retry status (MRS) message to be transmitted to the client in response to the command queue being full.

MECHANISM TO ACCELERATE GRAPHICS WORKLOADS IN A MULTI-CORE COMPUTING ARCHITECTURE
20230119093 · 2023-04-20 · ·

A processing apparatus is described. The apparatus includes a plurality of processing cores, including a first processing core and a second processing core a first field programmable gate array (FPGA) coupled to the first processing core to accelerate execution of graphics workloads processed at the first processing core and a second FPGA coupled to the second processing core to accelerate execution of workloads processed at the second processing core.