G06F2212/621

SYSTEMS AND METHODS FOR TRANSFORMING DATA IN-LINE WITH READS AND WRITES TO COHERENT HOST-MANAGED DEVICE MEMORY
20220327052 · 2022-10-13 ·

The disclosed computer-implemented method may include (1) receiving, from an external host processor via a cache-coherent interconnect, a request to access a host address of a coherent memory space of the external host processor, (2) when the request is to read data from the host address, (a) performing an in-line transformation on the data to generate second data and (b) writing the second data to the physical address of the device-attached physical memory mapped to the host address, and (3) when the request is to read data from the host address, (a) reading the data from the physical address of the device-attached physical memory mapped to the host address, (b) performing a reversing in-line transformation on the data to generate second data, and (c) returning the second data to the external host processor via the cache-coherent interconnect. Various other methods, systems, and computer-readable media are also disclosed.

Bias-based coherency in an interconnect fabric

A fabric controller to provide a coherent accelerator fabric, including: a host interconnect to communicatively couple to a host device; a memory interconnect to communicatively couple to an accelerator memory; an accelerator interconnect to communicatively couple to an accelerator having a last-level cache (LLC); and an LLC controller configured to provide a bias check for memory access operations.

DATA PROCESSING METHOD AND APPARATUS AND HETEROGENEOUS SYSTEM

A data processing method and apparatus, and a heterogeneous system, pertaining to the field of computer technologies are provided. The heterogeneous system includes a processor connected to an accelerator. A secondary memory is connected to the accelerator. The processor is configured to write to-be-processed data into the secondary memory and trigger the accelerator to access and process the to-be-processed data stored in the secondary memory according to a processing instruction. The accelerator is configured to write a processing result of the to-be-processed data into the secondary memory and to trigger the processor to read the processing result. Processing efficiency is enhanced by reducing the number of times of interaction between the processor and the accelerator and simplifying the procedure for data processing.

Techniques for efficiently transferring data to a processor

A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

SYSTEMS AND METHODS FOR PRE-PROCESSING AND POST-PROCESSING COHERENT HOST-MANAGED DEVICE MEMORY
20220334972 · 2022-10-20 ·

The disclosed computer-implemented method may include receiving, from a host via a cache-coherent interconnect, a request to access an address of a coherent memory space of the host. When the request is to write data, the computer-implemented method may include (1) performing, after receiving the data, a post-processing operation on the data to generate post-processed data and (2) writing the post-processed data to a physical address of a device-attached physical memory mapped to the address. When the request is to read data, the computer-implemented method may include (1) reading the data from the physical address of a device-attached physical memory mapped to the address, (2) performing, before responding to the request, a pre-processing operation on the data to generate pre-processed data, and (3) returning the pre-processed data to the external host via the cache-coherent interconnect. Various other methods, systems, and computer-readable media are also disclosed.

Cache coherency engine

A method for operating a database and a cache of at least a portion of the database may include receiving a plurality of read requests to read a data entity from the database and counting respective quantities of the requests serviced from the database and from the cache. The method may further include receiving a write request to alter the data entity in the database and determining whether to update the cache to reflect the alteration to the data entity in the write request according to the quantity of the requests serviced from the database and the quantity of the requests serviced from the cache. In an embodiment, the method further includes causing the cache to be updated when a ratio of the quantity of the requests serviced from the database to the quantity of the requests serviced from the cache exceeds a predetermined threshold.

System and Method for Shared Memory Ownership Using Context
20170371570 · 2017-12-28 ·

It is possible to reduce the latency attributable to memory protection in shared memory systems by performing access protection at a central Data Ownership Manager (DOM), rather than at distributed memory management units in the central processing unit (CPU) elements (CEs) responsible for parallel thread processing. In particular, the DOM may monitor read requests communicated over a data plane between the CEs and a memory controller, and perform access protection verification in parallel with the memory controller's generation of the data response. The DOM may be separate and distinct from both the CEs and the memory controller, and therefore may generally be able to make the access determination without interfering with data plane processing/generation of the read requests and data responses exchanged between the memory controller and the CEs.

PROCESSING COMMANDS IN A DIRECTORY-BASED COMPUTER MEMORY MANAGEMENT SYSTEM

A method for processing commands in a directory-based computer memory management system includes receiving a command to perform an operation on data stored in a set of one or more computer memory locations associated with an entry in a directory of a computer memory, the entry is associated with an indicator for indicating whether the set of one or more computer memory locations is busy, a head tag, and a tail tag. The command is associated with a command tag and a predecessor tag, and checking the indicator to determine whether the set of one or more computer memory locations is busy.

TECHNIQUES FOR MAINTAINING CONSISTENCY BETWEEN ADDRESS TRANSLATIONS IN A DATA PROCESSING SYSTEM

A technique for operating a memory management unit (MMU) of a processor includes the MMU detecting that one or more address translation invalidation requests are indicated for an accelerator unit (AU). In response to detecting that the invalidation requests are indicated, the MMU issues a raise barrier request for the AU. In response to detecting a raise barrier response from the AU to the raise barrier request the MMU issues the invalidation requests to the AU. In response to detecting an address translation invalidation response from the AU to each of the invalidation requests, the MMU issues a lower barrier request to the AU. In response to detecting a lower barrier response from the AU to the lower barrier request, the MMU resumes handling address translation check-in and check-out requests received from the AU.

Network cache injection for coherent GPUs

Methods, devices, and systems for GPU cache injection. A GPU compute node includes a network interface controller (NIC) which includes NIC receiver circuitry which can receive data for processing on the GPU, NIC transmitter circuitry which can send the data to a main memory of the GPU compute node and which can send coherence information to a coherence directory of the GPU compute node based on the data. The GPU compute node also includes a GPU which includes GPU receiver circuitry which can receive the coherence information; GPU processing circuitry which can determine, based on the coherence information, whether the data satisfies a heuristic; and GPU loading circuitry which can load the data into a cache of the GPU from the main memory if on the data satisfies the heuristic.