G06F2212/681

ADDRESS TRANSLATION CACHE AND SYSTEM INCLUDING THE SAME
20230169013 · 2023-06-01 ·

An address translation cache (ATC) is configured to store translation entries indicating mapping information between a virtual address and a physical address of a memory device. The ATC includes a plurality flexible page group caches, a shared cache and a cache manager. Each flexible page group cache stores translation entries corresponding to a page size allocated to the flexible group cache. The shared cache stores, regardless of page sizes, translation entries that are not stored in the plurality of flexible page group caches. The cache manager allocates a page size to each flexible page group cache, manages cache page information on the page sizes allocated to the plurality of flexible page group caches, and controls the plurality of flexible page group caches and the shared cache based on the cache page information.

FAULTING ADDRESS PREDICTION FOR PREFETCH TARGET ADDRESS

An apparatus comprises memory management circuitry to perform a translation table walk for a target address of a memory access request and to signal a fault in response to the translation table walk identifying a fault condition for the target address, prefetch circuitry to generate a prefetch request to request prefetching of information associated with a prefetch target address to a cache; and faulting address prediction circuitry to predict whether the memory management circuitry would identify the fault condition for the prefetch target address if the translation table walk was performed by the memory management circuitry for the prefetch target address. In response to a prediction that the fault condition would be identified for the prefetch target address, the prefetch circuitry suppresses the prefetch request and the memory management circuitry prevents the translation table walk being performed for the prefetch target address of the prefetch request.

TRANSLATION LOOKASIDE BUFFER SWITCH BANK

Example devices are disclosed. For example, a device may include a processor, a plurality of translation lookaside buffers, a plurality of switches, and a memory management unit. Each of the translation lookaside buffers may be assigned to a different process of the processor, each of the plurality of switches may include a register for storing a different process identifier, and each of the plurality of switches may be associated with a different one of the translation lookaside buffer buffers. The memory management unit may be for receiving a virtual memory address and a process identifier from the processor and forwarding the process identifier to the plurality of switches. Each of the plurality of switches may be for connecting the memory management unit to a translation associated with the switch when there is a match between the process identifier and the different process identifier stored by the register of the switch.

Graphics surface addressing

Techniques are disclosed relating to memory allocation for graphics surfaces. In some embodiments, graphics processing circuitry is configured to access a graphics surface based on an address in a surface space assigned to the graphics surface. In some embodiments, first translation circuitry is configured to translate address information for the surface space to address information in the virtual space based on one or more of the translation entries. In some embodiments, the graphics processing circuitry is configured to provide an address for the access to the graphics surface based on translation by the first translation circuitry and second translation circuitry configured to translate the address in the virtual space to an address in a physical space of a memory configured to store the graphics surface. The disclosed techniques may allow sparse allocation of large graphics surfaces, in various embodiments.

External memory based translation lookaside buffer
11243891 · 2022-02-08 · ·

Methods, devices, and systems for virtual address translation. A memory management unit (MMU) receives a request to translate a virtual memory address to a physical memory address and searching a translation lookaside buffer (TLB) for a translation to the physical memory address based on the virtual memory address. If the translation is not found in the TLB, the MMU searches an external memory translation lookaside buffer (EMTLB) for the physical memory address and performs a page table walk, using a page table walker (PTW), to retrieve the translation. If the translation is found in the EMTLB, the MMU aborts the page table walk and returns the physical memory address. If the translation is not found in the TLB and not found in the EMTLB, the MMU returns the physical memory address based on the page table walk.

Cache replacement based on traversal tracking
11429535 · 2022-08-30 · ·

Techniques are disclosed relating to controlling cache replacement. In some embodiments, search control circuitry is configured to perform multiple searches of a data structure (e.g., page table walks) where searches traverse multiple links between elements of the data structure. In some embodiments, a traversal cache caches traversal information that is usable by searches to skip one or more links traversed by one or more prior searches. In some embodiments, tracking control circuitry stores tracking information in a first entry, where the tracking information indicates a location in the traversal cache at which prior traversal information for a first search is stored. In some embodiments, replacement control circuitry selects, based on the tracking information in the first entry of the tracking control circuitry, an entry in the traversal cache for new traversal information generated by the first search (which may include selecting the first entry to override a default replacement policy).

TRANSLATION BANDWIDTH OPTIMIZED PREFETCHING STRATEGY THROUGH MULTIPLE TRANSLATION LOOKASIDE BUFFERS

A computer system includes a processor and a prefetch engine. The processor is configured to generate a demand access stream. The prefetch engine is configured to generate a first prefetch request and a second prefetch request based on the demand access stream, to output the first prefetch request to a first translation lookaside buffer (TLB), and to output the second prefetch request to a second TLB that is different from the first TLB. The processor performs a first TLB lookup in the first TLB based on one of the demand access stream or the first prefetch request, and performs a second TLB lookup in the second TLB based on the second prefetch request.

POWER OPTIMIZED PREFETCHING IN SET-ASSOCIATIVE TRANSLATION LOOKASIDE BUFFER STRUCTURE

A computer system includes a processor and a prefetch engine. The processor is configured to generate a demand access stream. The prefetch engine is configured to initiate a first prefetch request based on the demand access stream and perform a first prefetch that includes performing a translation lookaside buffer (TLB) lookup on a TLB structure in response to the first prefetch request. The processor determines a TLB entry in response to performing the TLB lookup and performs at least one second prefetch based on the TLB entry without performing a subsequent TLB lookup on the TLB structure.

Memory system with first cache for storing uncompressed look-up table segments and second cache for storing compressed look-up table segments

A memory system is connectable to the host. The memory system includes a nonvolatile first memory, a second memory in which a plurality of pieces of first information each correlating a logical address indicating a location in a logical address space of the memory system with a physical address indicating a location in the first memory are stored, a volatile third memory including a first cache and a second cache, a compressor configured to perform compression on the plurality of pieces of first information, and a memory controller. The memory controller stores the first information not compressed by the compressor in the first cache, stores the first information compressed by the compressor in the second cache, and controls a ratio between a first capacity, which is a capacity of the first cache, and a second capacity, which is a capacity of the second cache.

Facilitating efficient prefetching for scatter/gather operations

The disclosed embodiments relate to a computing system that facilitates performing prefetching for scatter/gather operations. During operation, the system receives a scatter/gather prefetch instruction at a processor core, wherein the scatter/gather prefetch instruction specifies a virtual base address, and a plurality of offsets. Next, the system performs a lookup in a translation-lookaside buffer (TLB) using the virtual base address to obtain a physical base address that identifies a physical page for the base address. The system then sends the physical base address and the plurality of offsets to a cache. This enables the cache to perform prefetching operations for the scatter/gather instruction by adding the physical base address to the plurality of offsets to produce a plurality of physical addresses, and then prefetching cache lines for the plurality of physical addresses into the cache.