Patent classifications
G06F2212/655
APPARATUS AND METHOD FOR HANDLING ACCESS REQUESTS
An apparatus and method are provided for handling access requests. The apparatus has processing circuitry for processing a plurality of program threads to perform data processing operations on data, where the operations identify the data using virtual addresses, and the virtual addresses are mapped to physical addresses within a memory system. The cache storage has a plurality of cache entries to store data, an aliasing condition existing when multiple virtual addresses map to the same physical address, and allocation of data into the cache storage being constrained to prevent multiple cache entries of the cache storage simultaneously storing data for the same physical address. Cache access circuitry is then responsive to an access request specifying a virtual address, to utilise a cache index at least partially determined from the specified virtual address to identify at least one cache entry within the cache storage, and to detect whether a hit is present within the at least one cache entry by comparing a physical address portion associated with that cache entry with a tag portion of the physical address corresponding to the specified virtual address. Remap handling circuitry is then arranged whilst a first program thread is in the process of performing an exclusive operation using a first virtual address to identify a specified physical address whose data is stored in the cache storage, to detect a remap condition when a second program thread issues a second program thread access request of at least one type that specifies a second virtual address that exhibits the aliasing condition with the first virtual address. In the presence of the remap condition, the remap handling circuitry remaps the cache index at least partially determined from the second virtual address, so that the remapped cache index as then used by the cache access circuitry matches the cache index at least partially determined from the first virtual address. This provides an effective mechanism for avoiding potential live-lock scenarios that can otherwise arise.
Systems and methods for direct data access in multi-level cache memory hierarchies
Methods and systems for in direct data access in, e.g., multi-level cache memory systems are described. A cache memory system includes a cache location buffer configured to store cache location entries, wherein each cache location entry includes an address tag and a cache location table which are associated with a respective cacheline stored in a cache memory. The system also includes a first cache memory configured to store cachelines, each cacheline having data and an identity of a corresponding cache location entry in the cache location buffer, and a second cache memory configured to store cachelines, each cacheline having data and an identity of a corresponding cache location entry in the cache location buffer. Responsive to a memory access request for a cacheline, the cache location buffer generates access information using one of the cache location tables which enables access to the cacheline without performing a tag comparison at the one of the first and second cache memories.
Executing load-store operations without address translation hardware per load-store unit port
Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.
HOST PERFORMANCE BOOSTER L2P HANDOFF
Methods that may be performed by a host controller of a computing device for synchronizing logical-to-physical (L2P) tables before entering a hibernate mode are disclosed. Embodiment methods may include determining whether a first L2P table stored in a dynamic random-access memory (DRAM) communicatively connected to the host controller is out of synchronization with a second L2P table stored in a static random-access memory (SRAM) of a universal flash storage (UFS) device communicatively connected to the host controller via a link. If the first and second L2P tables are out of synch, the host controller may retrieve at least one modified L2P map entry from the second L2P table when the UFS device is configured to enter a hibernate mode from the UFS device, and update the first L2P tabled with the at least one modified L2P map entry before the link and the UFS device enter the hibernate mode.
TECHNOLOGIES FOR PROVIDING EDGE DEDUPLICATION
Technologies for providing deduplication of data in an edge network includes a compute device having circuitry to obtain a request to write a data set. The circuitry is also to apply, to the data set, an approximation function to produce an approximated data set. Additionally, the circuitry is to determine whether the approximated data set is already present in a shared memory and write, to a translation table and in response to a determination that the approximated data set is already present in the shared memory, an association between a local memory address and a location, in the shared memory, where the approximated data set is already present. Additionally, the circuitry is to increase a reference count associated with the location in the shared memory.
RESERVATION ARCHITECTURE FOR OVERCOMMITTED MEMORY
Various systems and methods for computer memory overcommitment management are described herein. A system for computer memory management includes a memory device to store data and a mapping table; and a memory overcommitment circuitry to: receive a signal to move data in a first block from a memory reduction area in the memory device to a non-memory reduction area in the memory device, the memory reduction area to store data using a memory reduction technique, and the non-memory reduction area to store data without any memory reduction techniques; allocate a second block in the non-memory reduction area; copy the data in the first block to the second block; and update the mapping table to revise a pointer to point to the second block, the mapping table used to store pointers to memory device in the memory reduction area and the non-memory reduction area.
MEMORY MANAGEMENT UNIT PERFORMANCE THROUGH CACHE OPTIMIZATIONS FOR PARTIALLY LINEAR PAGE TABLES OF FRAGMENTED MEMORY
A MMU may read page descriptors (using virtual addresses as an index) in a burst mode from page tables in a system memory. The page descriptors may include intermediate physical addresses (IPAs, in stage 1) and corresponding physical addresses (PAs, in stage 2). The virtual address in conjunction with page table base address register is used to index page descriptors into main memory. The MMU may identify a first group of contiguous IPAs beginning at a base IPA and a second group of contiguous IPAs beginning at an offset from the base IPA. The first and second groups may be separated by at least one IPA not contiguous with either the first or second group. The MMU may read a first PA from the page tables that corresponds to the base IPA. The MMU may store an entry in a buffer that includes the PA and a first linearity tag.
HYPERVISOR DEDUPLICATION PAGE COMPARISON SPEEDUP
A hypervisor deduplcation system includes a memory, a processor in communication with the memory, and a hypervisor executing on the processor. The hypervisor is configured to scan a first page, detect that the first page is an unchanged page, check a first free page hint, and insert the unchanged page into a tree. Responsive to inserting the unchanged page into the tree, the hypervisor compares the unchanged page to other pages in the tree and determine a status of the unchanged page as matching one of the other pages or mismatching the other pages in the tree. Responsive to determining the status of the page as matching another page, the hypervisor deduplicates the unchanged page. Additionally, the hypervisor is configured to scan a second page of the memory, check a second free page hint, deduplicate the second page if the free page hint indicates the page is unused.
PROCESS ISOLATION FOR OUT OF PROCESS PAGE FAULT HANDLING
A system and method relates to detecting a hardware event, determining a first virtual memory address associated with the hardware event, wherein the first virtual memory address is associated with a first processing thread, identifying, using the first virtual memory address, an entry of a logical address table, the entry comprising a file descriptor and a file offset associated with a file, identifying a memory address table associated with the file descriptor, translating, using the memory address table, the file offset into a second virtual memory address associated with a second processing thread, and transmitting, to the second processing thread, a notification comprising the second virtual memory address.
Predicting page migration granularity for heterogeneous memory systems
Systems, apparatuses, and methods for predicting page migration granularities for phases of an application executing on a non-uniform memory access (NUMA) system architecture are disclosed herein. A system with a plurality of processing units and memory devices executes a software application. The system identifies a plurality of phases of the application based on one or more characteristics (e.g., memory access pattern) of the application. The system predicts which page migration granularity will maximize performance for each phase of the application. The system performs a page migration at a first page migration granularity during a first phase of the application based on a first prediction. The system performs a page migration at a second page migration granularity during a second phase of the application based on a second prediction, wherein the second page migration granularity is different from the first page migration granularity.