Patent classifications
G06F12/0835
CONFIDENTIAL COMPUTE ARCHITECTURE INTEGRATED WITH DIRECT SWAP CACHING
Systems and methods for a confidential compute architecture integrated with direct swap caching are described. An example method for managing a near memory and a far memory includes, in response to determining that the far memory contains an encrypted version of a first block of data, retrieving from the far memory the encrypted version of the first block of data, decrypting the first block of data using a first key for exclusive use by a first virtual machine associated with the system, and providing a decrypted version of the first block of data to the requestor. The method further includes swapping out a second block of data having an address conflict with the first block of data from the near memory to the far memory, where the second block of data is encrypted using a second key for exclusive use by a second virtual machine associated with the system.
Low latency host processor to coherent device interaction
In a computer system, a processor and an I/O device controller communicate with each other via a coherence interconnect and according to a cache coherence protocol. Registers of the I/O device controllers are mapped to the cache coherent memory space to allow the processor to treat the registers as cacheable memory. As a result, latency of processor commands executed by the I/O device controller is decreased, and size of data stored in the I/O device controller that can be accessed by the processor is increased from the size of a single register to the size of an entire cache line.
Managing least recently used cache using reduced memory footprint sequence container
Techniques are provided for managing a least recently used cache using a linked list with a reduced memory footprint. A cache manager receives an I/O request comprising a target address, wherein the cache manager manages a cache memory having a maximum allocated amount of cache entries, and a linked list having a maximum allocated amount of list elements which is less than the maximum allocated amount of cache entries. If the target address does correspond to a cache entry, the cache manager accesses the cache entry to obtain the cache data from cache memory, removes a list element from the linked list, which corresponds to the accessed cache entry, selects an existing cache entry which currently does not have a corresponding list element in the linked list, and adds a list element to a head position of the linked list which corresponds to the selected cache entry.
Systems and methods for pre-processing and post-processing coherent host-managed device memory
The disclosed computer-implemented method may include receiving, from a host via a cache-coherent interconnect, a request to access an address of a coherent memory space of the host. When the request is to write data, the computer-implemented method may include (1) performing, after receiving the data, a post-processing operation on the data to generate post-processed data and (2) writing the post-processed data to a physical address of a device-attached physical memory mapped to the address. When the request is to read data, the computer-implemented method may include (1) reading the data from the physical address of a device-attached physical memory mapped to the address, (2) performing, before responding to the request, a pre-processing operation on the data to generate pre-processed data, and (3) returning the pre-processed data to the external host via the cache-coherent interconnect. Various other methods, systems, and computer-readable media are also disclosed.
CACHE-COHERENT INTERCONNECT BASED NEAR-DATA-PROCESSING ACCELERATOR
A memory system is disclosed. The memory system may include a first cache-coherent interconnect memory module and a second cache-coherent interconnect memory module. A cache-coherent interconnect switch may connect the first cache-coherent interconnect memory module, the second cache-coherent interconnect memory module, and a processor. A processing element may process a data stored on at least one of the first cache-coherent interconnect memory module and the second cache-coherent interconnect memory module.
INTEGRATED CIRCUIT AND CONFIGURATION METHOD THEREOF
An integrated circuit and a configuration method thereof are disclosed. The integrated circuit, applied to a neural network model calculation, includes a first operator engine, a second operator engine, a random access memory (RAM) and a direct memory access (DMA) engine. The first operator engine is configured to perform a first calculation operation. The second operator engine is configured to perform a second calculation operation. The DMA engine performs an access operation on the RAM according to a first memory management unit (MMU) table when the first operator engine performs the first calculation operation, and performs an access operation on the RAM according to a second MMU table when the second operator engine performs the second calculation operation.
Memory Access Tracking Using a Peripheral Device
A compute node includes a memory, a processor and a peripheral device. The memory is to store memory pages. The processor is to run software that accesses the memory, and to identify one or more first memory pages that are accessed by the software in the memory. The peripheral device is to directly access one or more second memory pages in the memory of the compute node using Direct Memory Access (DMA), and to notify the processor of the second memory pages that are accessed using DMA. The processor is further to maintain a data structure that tracks both (i) the first memory pages as identified by the processor and (ii) the second memory pages as notified by the peripheral device.
Interleaved cache prefetching
A method includes receiving, at a direct memory access (DMA) controller of a memory device, a first command from a first cache controller coupled to the memory device to prefetch first data from the memory device and sending the prefetched first data, in response to receiving the first command, to a second cache controller coupled to the memory device. The method can further include receiving a second command from a second cache controller coupled to the memory device to prefetch second data from the memory device, and sending the prefetched second data, in response to receiving the second command, to a third cache controller coupled to the memory device.
HARDWARE MANAGEMENT OF DIRECT MEMORY ACCESS COMMANDS
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.
Methods and apparatus for accelerating virtual machine migration
A server having a host processor coupled to a programmable coprocessor is provided. One or more virtual machines may run on the host processor. The coprocessor may be coupled to an auxiliary memory that stores virtual machine (VM) states. During live migration, the coprocessor may determine when to move the VM states from the auxiliary memory to a remote server node. The coprocessor may include a coherent protocol home agent and state tracking circuitry configured to track data modification at a cache line granularity. Whenever a particular cache line has been modified, only the data associated with that cache line will be moved to the remote server without having to copy over the entire page, thereby substantially reducing the amount of data that needs to be transferred during migration events.