Patent classifications
G06F2212/681
Hardware for split data translation lookaside buffers
Systems, methods, and apparatuses relating to hardware for split data translation lookaside buffers. In one embodiment, a processor includes a decode circuit to decode instructions into decoded instructions, an execution circuit to execute the decoded instructions, and a memory circuit comprising a load data translation lookaside buffer circuit and a store data translation lookaside buffer circuit separate and distinct from the load data translation lookaside buffer circuit, wherein the memory circuit sends a memory access request of the instructions to the load data translation lookaside buffer circuit when the memory access request is a load data request and to the store data translation lookaside buffer circuit when the memory access request is a store data request to determine a physical address for a virtual address of the memory access request.
ZERO LATENCY PREFETCHING IN CACHES
This invention involves a cache system in a digital data processing apparatus including: a central processing unit core; a level one instruction cache; and a level two cache. The cache lines in the second level cache are twice the size of the cache lines in the first level instruction cache. The central processing unit core requests additional program instructions when needed via a request address. Upon a miss in the level one instruction cache that causes a hit in the upper half of a level two cache line, the level two cache supplies the upper half level cache line to the level one instruction cache. On a following level two cache memory cycle, the level two cache supplies the lower half of the cache line to the level one instruction cache. This cache technique thus prefetches the lower half level two cache line employing fewer resources than an ordinary prefetch.
Memory management device capable of managing memory address translation table using heterogeneous memories and method of managing memory address thereby
Provided are a GPU and a method of managing a memory address thereby. TLBs are configured using an SRAM and an STT-MRAM so that a storage capacity is significantly improved compared to a case where a TLB is configured using an SRAM. Accordingly, the page hit rate of a TLB can be increased, thereby improving the throughput of a device. Furthermore, after a PTE is first written in the SRAM, a PTE having high frequency of use is selected and moved to the STT-MRAM. Accordingly, an increase in TLB update time which may occur due to the use of the STT-MRAM having a low write speed can be prevented. Furthermore, a read time and read energy consumption can be reduced.
Poison Mechanisms for Deferred Invalidates
An apparatus includes multiple processors including respective cache memories, the cache memories configured to cache cache-entries for use by the processors. At least a processor among the processors includes cache management logic that is configured to (i) receive, from one or more of the other processors, cache-invalidation commands that request invalidation of specified cache-entries in the cache memory of the processor (ii) mark the specified cache-entries as intended for invalidation but defer actual invalidation of the specified cache-entries, and (iii) upon detecting a synchronization event associated with the cache-invalidation commands, invalidate the cache-entries that were marked as intended for invalidation.
LEVEL-AWARE CACHE REPLACEMENT
An electronic device includes one or more processors and a cache that stores data entries. The electronic device transmits a request for translation of a first address to the cache. In accordance with a determination that the request is not satisfied by the data entries in the cache, the electronic device transmits the request to memory that is distinct from the cache, and receives data including a second address corresponding to the first address. In accordance with a determination that the data does not satisfy cache promotion criteria, the electronic device replaces an entry at a first priority level in the cache with the data. In accordance with a determination that the data satisfies the cache promotion criteria, the electronic device replaces an entry at a second priority level that is a higher priority level than the first priority level in the cache with the data including the second address.
Gathering translation entry invalidation requests in a data processing system
An arbiter gathers translation invalidation requests assigned to state machines of a lower-level cache into a set for joint handling in a processor core. The gathering includes selection of one of the set of gathered translation invalidation requests as an end-of-sequence (EOS) request. The arbiter issues to the processor core a sequence of the gathered translation invalidation requests terminating with the EOS request. Based on receipt of each of the gathered requests, the processor core invalidates any translation entries providing translation for the addresses specified by the translation invalidation requests and marks memory-referent requests dependent on the invalidated translation entries. Based on receipt of the EOS request and in response to all of the marked memory-referent requests draining from the processor core, the processor core issues a completion request to the lower-level cache indicating completion of servicing by the processor core of the set of gathered translation invalidation requests.
Address translation in a data processing apparatus
Apparatus for data processing and a method of data processing are provided. Address translation storage stores address translations between first set addresses and second set addresses, and responds to a request comprising a first set address to return a response comprising a second set address if the required address translation is currently stored therein. If it is not the request is forwarded towards memory in a memory hierarchy. A pending request storage stores entries for received requests and in response to reception of the request, if a stored entry for a previous request indicates that the previous request has been forwarded towards the memory and an expected response to the previous request will provide the address translation, intercepts the request to delay its reception by the address translation storage. Bandwidth pressure on the address translation storage is thus relieved.
VIRTUALIZED SYSTEM AND METHOD OF PREVENTING MEMORY CRASH OF SAME
A virtualized system is provided. The virtualized system includes: a memory device; a processor configured to provide a virtualization environment; a direct memory access device configured to perform a function of direct memory access to the memory device; and a memory management circuit configured to manage a core access of the processor to the memory device and a direct access of the direct memory access device to the memory device. The processor is further configured to provide: a plurality of guest operating systems that run independently from each other on a plurality of virtual machines of the virtualization environment; and a hypervisor configured to control the plurality of virtual machines in the virtualization environment and control the memory management circuit to block the direct access when a target guest operating system controlling the direct memory access device, among the plurality of guest operating systems is rebooted.
MEMORY DEVICE AND METHOD FOR ACCESSING MEMORY DEVICE
The invention provides a memory device including a memory array, an internal memory, and a processor. The memory array stores node mapping tables for access data in the memory array. The internal memory includes a cached mapping table area and has a root mapping table. The processor determines whether a first node mapping table of the node mapping tables is temporarily stored in the cached mapping table area according to the root mapping table. In response to the first node mapping table is temporarily stored in the cached mapping table area, the processor accesses data according to the first node mapping table in the cached mapping table area, marks the modified first node mapping table through an asynchronous index identifier, and writes back the modified first node mapping table from the cached mapping table area to the memory array.
HOT PAGE DETECTION BY SAMPLING TLB RESIDENCY
The disclosed technology provides for an improved memory tiering arrangement. In one aspect, an apparatus includes a sampling register and logic, responsive to sequential read requests, to read page data entries stored in successive locations in a TLB and provide page data from the page data entries as sequential outputs of the sampling register. In another aspect, a method includes generating a page residency list based on scanning, via a sampling register, page data entries stored in successive locations in a TLB, determining, for each page, whether the respective page is a hot page or a cold page based on the page residency list, and assigning hot pages to a first memory tier and cold pages to a second memory tier. Scanning page data entries stored in the TLB can include issuing a sequence of read requests to the sampling register sufficient to read all entries in the TLB.