G06F2212/683

DSB Operation with Excluded Region

Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Dynamic translation lookaside buffer (TLB) invalidation using virtually tagged cache for load/store operations

Translation lookaside buffer (TLB) invalidation using virtual addresses is provided. A cache is searched for a virtual address matching the input virtual address. Based on a matching virtual address in the cache, the corresponding cache entry is invalidated. The load/store queue is searched for a set and a way corresponding to the set and the way of the invalidated cache entry. Based on an entry in the load/store queue matching the set and the way of the invalidated cache entry, the entry in the load/store queue is marked as pending. Indicating a completion of the TLB invalidate instruction is delayed until all pending entries in the load/store queues are complete.

Reducing Translation Lookaside Buffer Searches for Splintered Pages
20220075734 · 2022-03-10 ·

Systems, apparatuses, and methods for performing efficient translation lookaside buffer (TLB) invalidation operations for splintered pages are described. When a TLB receives an invalidation request for a specified translation context, and the invalidation request maps to an entry with a relatively large page size, the TLB does not know if there are multiple translation entries stored in the TLB for smaller splintered pages of the relatively large page. The TLB tracks whether or not splintered pages for each translation context have been installed. If a TLB invalidate (TLBI) request is received, and splintered pages have not been installed, no searches are needed for splintered pages. To refresh the sticky bits, whenever a full TLB search is performed, the TLB rescans for splintered pages for other translation contexts. If no splintered pages are found, the sticky bit can be cleared and the number of full TLBI searches is reduced.

Translation Lookaside Buffer Striping for Efficient Invalidation Operations
20220066947 · 2022-03-03 ·

Systems, apparatuses, and methods for implementing translation lookaside buffer (TLB) striping to enable efficient invalidation operations are described. TLB sizes are growing in width (more features in a given page table entry) and depth (to cover larger memory footprints). A striping scheme is proposed to enable an efficient and high performance method for performing TLB maintenance operations in the face of this growth. Accordingly, a TLB stores first attribute data in a striped manner across a plurality of arrays. The striped manner allows different entries to be searched simultaneously in response to receiving an invalidation request which identifies a particular attribute of a group to be invalidated. Upon receiving an invalidation request, the TLB generates a plurality of indices with an offset between each index and walks through the plurality of arrays by incrementing each index and simultaneously checking the first attribute data in corresponding entries.

TECHNIQUES TO IMPROVE TRANSLATION LOOKASIDE BUFFER REACH BY LEVERAGING IDLE RESOURCES

Techniques are disclosed for processing address translations. The techniques include detecting a first miss for a first address translation request for a first address translation in a first translation lookaside buffer, in response to the first miss, fetching the first address translation into the first translation lookaside buffer and evicting a second address translation from the translation lookaside buffer into an instruction cache or local data share memory, detecting a second miss for a second address translation request referencing the second address translation, in the first translation lookaside buffer, and in response to the second miss, fetching the second address translation from the instruction cache or the local data share memory.

Storage Array Invalidation Maintenance

Techniques are disclosed relating to managing storage array invalidations. A computer system may comprise a processor core configured to operate in an idle state and operate in a run state in which the processor core executes instructions. The computer system may further comprise a power management circuit that is configured to receive, while the processor core is in the idle state, a set of invalidation requests directed to the processor core to invalidate a set of entries of a storage array of the processor core. The power management circuit may store invalidation information indicative of the set of invalidation requests. The power management circuit may determine that the processor core has received a request to transition to the run state. Prior to the processor core operating in the run state, the power management circuit may invalidate the set of entries of the storage array based on the invalidation information.

Microarchitectural mechanisms for the prevention of side-channel attacks

Systems, methods, and apparatuses relating to microarchitectural mechanisms for the prevention of side-channel attacks are disclosed herein. In one embodiment, a processor includes a core having a plurality of physical contexts to execute a plurality of threads, a plurality of structures shared by the plurality of threads, a context mapping structure to map context signatures to respective physical contexts of the plurality of physical contexts, each physical context to identify and differentiate state of the plurality of structures, and a context manager circuit to, when one or more of a plurality of fields that comprise a context signature is changed, search the context mapping structure for a match to another context signature, and when the match is found, a physical context associated with the match is set as an active physical context for the core.

System and method for broadcast cache invalidation

One embodiment includes a system comprising a repository configured to store objects, an object cache configured to cache objects retrieved from the repository by a node, a memory configured to store a broadcast cache invalidation queue accessible by a plurality of nodes and an invalidation status, a processor coupled to the memory and a computer readable medium storing computer-executable instructions. The computer-executable instructions can be executable to store cache invalidations in the invalidation queue, the cache invalidations identifying objects affected by operations, access the invalidation status to determine a last processed invalidation from the invalidation queue, determine a set of unprocessed invalidations from the cache invalidation queue, the unprocessed invalidations subsequent to the last processed invalidation, clear cached objects from the object cache based on the set of unprocessed invalidations and update the invalidation status based on a last invalidation from the set of unprocessed invalidations.

Faster access of virtual machine memory backed by a host computing device's virtual memory

To increase the speed with which the hierarchical levels of a Second Layer Address Table (SLAT) are traversed as part of a memory access where the guest physical memory of a virtual machine environment is backed by virtual memory assigned to one or more processes executing on a host computing device, one or more hierarchical levels of tables within the SLAT can be skipped or otherwise not referenced. While the SLAT can be populated with memory correlations at hierarchically higher-levels of tables, the page table of the host computing device, supporting the host computing device's provision of virtual memory, can maintain a corresponding contiguous set of memory correlations at the hierarchically lowest table level, thereby enabling the host computing device to page out, or otherwise manipulate, smaller chunks of memory. If such manipulation occurs, the SLAT can be repopulated with memory correlations at the hierarchically lowest table level.

Realm identifier comparison for translation cache lookup

An apparatus has a translation cache (100) comprising a number of entries for specifying address translation data. Each entry (260) also specifies a translation context identifier (254) associated with the address translation data and a realm identifier (270) identifying one of a number of realms. Each realm corresponds to at least a portion of at least one software process executed by processing circuitry (8). In response to a memory access a lookup of the translation cache (100) is triggered. When the lookup misses in the cache (100), control circuitry (280) prevents allocation of address translation data to the cache when the current realm is excluded from accessing the target memory region by an owner realm specified for the target memory region. In the lookup, whether a given entry (260) matches the memory access depends on both a translation context identifier comparison and a realm identifier comparison.