Patent classifications
G06F2212/681
Cache arbitration for address translation requests
Techniques are disclosed relating to caching for address translation. In some embodiments, address translation circuitry is configured to process requests to translate addresses in a first address space to addresses in a second address space. The translation circuitry may include cache circuitry configured to store translation information, arbitration circuitry configured to arbitrate among ready requests for access to entries of the cache, and hazard circuitry. The hazard circuitry may assign a first request to an ready status the arbitration circuitry based on detection of an absence of hazards for a first address of the first request and add a second request to a queue of requests for the arbitration circuitry based on detection of a hazard for a second address of the second request. Independent arbitration for requests without hazards may improve performance in various aspects, relative to traditional techniques.
Access frequency caching hardware structure
An access frequency caching hardware structure has entries each storing an access frequency counter indicative of a frequency of accesses to a corresponding page of a memory address space. Access frequency tracking circuitry is responsive to a given memory access request requesting access to a target page, to determine whether the access frequency caching hardware structure already includes a corresponding entry which is valid and corresponds to the target page. When the structure includes the corresponding entry, a corresponding access frequency counter specified by the corresponding entry is incremented. In response to a counter writeback event associated with a selected access frequency counter corresponding to a selected page, an update is made to a global access frequency counter corresponding to the selected page within a global access frequency tracking data structure stored in the memory system.
Translation bandwidth optimized prefetching strategy through multiple translation lookaside buffers
A computer system includes a processor and a prefetch engine. The processor is configured to generate a demand access stream. The prefetch engine is configured to generate a first prefetch request and a second prefetch request based on the demand access stream, to output the first prefetch request to a first translation lookaside buffer (TLB), and to output the second prefetch request to a second TLB that is different from the first TLB. The processor performs a first TLB lookup in the first TLB based on one of the demand access stream or the first prefetch request, and performs a second TLB lookup in the second TLB based on the second prefetch request.
HARDWARE TRANSLATION REQUEST RETRY MECHANISM
A processing system includes a hardware translation lookaside buffer (TLB) retry loop that retries virtual memory address to physical memory address translation requests from a software client independent of a command from the software client. In response to a retry response notification at the TLB, a controller of the TLB waits for a programmable delay period and then retries the request without involvement from the software client. After a retry results in a hit at the TLB, the controller notifies the software client of the hit. Alternatively, if a retry results in an error at the TLB, the controller notifies the software client of the error and the software client initiates error handling.
STLB prefetching for a multi-dimension engine
A multi-dimension engine, connected to a system TLB, generates sequences of addresses to request page address translation prefetch requests in advance of predictable accesses to elements within data arrays. Prefetch requests are filtered to avoid redundant requests of translations to the same page. Prefetch requests run ahead of data accesses but are tethered to within a reasonable range. The number of pending prefetches are limited. A system TLB stores a number of translations, the number being relative to the dimensions of the range of elements accessed from within the data array.
OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING SIMULTANEOUS TWO-TARGET LOADS AND STORES
Operation of a multi-slice processor that includes a plurality of execution slices and a load/store superslice, where the load/store superslice includes a set predict array, a first load/store slice, and a second load/store slice. Operation of such a multi-slice processor includes: receiving a two-target load instruction directed to the first load/store slice and a store instruction directed to the second load/store slice; determining a first subset of ports of the set predict array as inputs for an effective address for the two-target load instruction; determining a second subset of ports of the set predict array as inputs for an effective address for the store instruction; and generating, in dependence upon logic corresponding to the set predict array that is less than logic implementing an entire load/store slice, output for performing the two-target load instruction in parallel with generating output for performing the store instruction.
TLB ACCESS MONITORING
An apparatus includes circuitry couplable to a host system and a memory device. The circuitry is configured to determine whether a page table maintained on the circuitry includes a physical address of the memory device corresponding to a virtual address associated with a TLB fill request from the host system. Responsive to determining that the page table includes the physical address, the circuitry provides signaling indicative of a completion to the TLB fill request to the host system, prefetch a page of data at the physical address from the memory device using the physical address from the page table, and provide signaling indicative of the page of data to the host system.
Using Multiple Memory Elements in an Input-Output Memory Management Unit for Performing Virtual Address to Physical Address Translations
The described embodiments include an input-output memory management unit (IOMMU) with two or more memory elements and a controller. The controller is configured to select, based on one or more factors, one or more selected memory elements from among the two or more memory elements for performing virtual address to physical address translations in the IOMMU. The controller then performs the virtual address to physical address translations using the one or more selected memory elements.
Management method of virtual-to-physical address translation system using part of bits of virtual address as index
A management method of a virtual-to-physical address translation system includes the following steps: providing a first storage space, wherein the first storage space includes a plurality of buffer entries; providing a second storage space, wherein the second storage space includes a plurality of translation entries, and the translation entries correspond to a plurality of translation indices; and when receiving a write instruction to write a first virtual-to-physical address translation into a specific buffer entry of the buffer entries, storing the first virtual-to-physical address translation in a write translation entry of the translation entries according to a first part of bits of a first virtual address corresponding to the first virtual-to-physical address translation, and storing the first virtual address and a write translation index corresponding to the write translation entry in the specific buffer entry.
CACHE MEMORY DEVICE AND DATA CACHE METHOD
A cache memory device is provided in the disclosure. The cache memory device includes a first AGC, a compression circuit, a second AGC, a virtual tag array, and a comparator circuit. The first AGC generates a virtual address based on a load instruction. The compression circuit obtains the higher part of the virtual address and generates a target hash value based on the higher part of the virtual address. The second AGC generates the lower part of the virtual address based on the load instruction. The virtual tag array obtains the lower part and selects a set of memory units. The comparator circuit compares the target hash value to a hash value stored in each memory unit of the set of memory units. When the comparator circuit generates the virtual tag miss signal, the comparator circuit transmits the virtual tag miss signal to the reservation station.