Patent classifications
G06F12/0855
Sector cache for compression
In an example, an apparatus comprises a plurality of execution units, and a cache memory communicatively coupled to the plurality of execution units, wherein the cache memory is structured into a plurality of sectors, wherein each sector in the plurality of sectors comprises at least two cache lines. Other embodiments are also disclosed and claimed.
Vector prefetching for computing systems
Described is a computing system for vector prefetching which includes a hierarchical memory including multiple caches, a missing address storage unit (MASU) associated with each cache which stores prefetch requests suffering a cache miss, a prefetcher which sends prefetch requests towards the hierarchical memory, and a vector prefetch unit. The vector prefetch unit determines existence of at least one of a relationship between a cache block associated with the prefetch request and cache blocks associated with one or more entries in a MASU, or a relationship between cache blocks associated with different entries in a MASU, and sends a vector prefetch request based on related prefetch requests including indicators indicating a starting cache block and a number of related cache blocks to a higher memory level to obtain data associated with each cache block. The hierarchical memory stores the data received in at least one response message from the higher memory level if available.
PHYSICAL ADDRESS PROXY REUSE MANAGEMENT
Each load/store queue entry holds a load/store physical address proxy (PAP) for use as a proxy for a load/store physical memory line address (PMLA). The load/store PAP comprises a set index and a way that uniquely identifies an L2 cache entry holding a memory line at the load/store PMLA when an L1 cache provides the load/store PAP during the load/store instruction execution. The microprocessor removes a line at a removal PMLA from an L2 entry, forms a removal PAP as a proxy for the removal PMLA that comprises a set index and a way, snoops the load/store queue with the removal PAP to determine whether the removal PAP is being used as a proxy for the removal PMLA, fills the removed entry with a line at a fill PMLA, and prevents the removal PAP from being used as a proxy for the removal PMLA and the fill PMLA concurrently.
MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS USING PHYSICAL ADDRESS PROXIES
A L2 set associative cache that is inclusive of an L1 cache. Each entry of a load queue holds a load physical address proxy (PAP) for a load physical memory line address (PMLA) rather than the load PMLA itself. The load PAP comprises the set index and the way that uniquely identifies the L2 entry that holds a memory line specified by the load PMLA. Each load queue entry indicates whether the load instruction has completed execution. The microprocessor removes a memory line at a removal PMLA from an L2 entry and forms a removal PAP as a proxy for the removal PMLA. The removal PAP comprises a set index and a way that uniquely identifies the removed entry. The microprocessor snoops the load queue with the removal PAP to determine whether the removal PAP matches one or more load PAPs in one or more load queue entries associated with one or more load instructions that have completed execution and, if so, signals an abort request.
UNFORWARDABLE LOAD INSTRUCTION RE-EXECUTION ELIGIBILITY BASED ON CACHE UPDATE BY IDENTIFIED STORE INSTRUCTION
A microprocessor includes a cache memory, a store queue, and a load/store unit. Each entry of the store queue holds store data associated with a store instruction. The load/store unit, during execution of a load instruction, makes a determination that an entry of the store queue holds store data that includes some but not all bytes of load data requested by the load instruction, cancels execution of the load instruction in response to the determination, and writes to an entry of a structure from which the load instruction is subsequently issuable for re-execution an identifier of a store instruction that is older in program order than the load instruction and an indication that the load instruction is not eligible to re-execute until the identified older store instruction updates the cache memory with store data.
STORE-TO-LOAD FORWARDING CORRECTNESS CHECKS USING PHYSICAL ADDRESS PROXIES STORED IN LOAD QUEUE ENTRIES
A microprocessor includes a load/store unit that performs store-to-load forwarding, a PIPT L2 set-associative cache, a store queue having store entries, and a load queue having load entries. Each L2 entry is uniquely identified by a set index and a way. Each store/load entry holds, for an associated store/load instruction, a store/load physical address proxy (PAP) for a store/load physical memory line address (PMLA). The store/load PAP specifies the set index and the way of the L2 entry into which a cache line specified by the store/load PMLA is allocated. Each load entry also holds associated load instruction store-to-load forwarding information. The load/store unit compares the store PAP with the load PAP of each valid load entry whose associated load instruction is younger in program order than the store instruction and uses the comparison and associated forwarding information to check store-to-load forwarding correctness with respect to each younger load instruction.
USING PHYSICAL ADDRESS PROXIES TO HANDLE SYNONYMS WHEN WRITING STORE DATA TO A VIRTUALLY-INDEXED CACHE
A microprocessor includes a virtually-indexed L1 data cache that has an allocation policy that permits multiple synonyms to be co-resident. Each L2 entry is uniquely identified by a set index and a way number. A store unit, during a store instruction execution, receives a store physical address proxy (PAP) for a store physical memory line address (PMLA) from an L1 entry hit upon by a store virtual address, and writes the store PAP to a store queue entry. The store PAP comprises the set index and the way number of an L2 entry that holds a line specified by the store PMLA. The store unit, during the store commit, reads the store PAP from the store queue, looks up the store PAP in the L1 to detect synonyms, writes the store data to one or more of the detected synonyms, and evicts the non-written detected synonyms.
MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS
A microprocessor prevents same address load-load ordering violations. Each load queue entry holds a load physical memory line address (PMLA) and an indication of whether a load instruction has completed execution. The microprocessor fills a line specified by a fill PMLA into a cache entry and snoops the load queue with the fill PMLA, either before the fill or in an atomic manner with the fill with respect to ability of the filled entry to be hit upon by any load instruction, to determine whether the fill PMLA matches load PMLAs in load queue entries associated with load instructions that have completed execution and there are other load instructions in the load queue that have not completed execution. The microprocessor, if the condition is true, flushes at least the other load instructions in the load queue that have not completed execution.
Thwarting Store-to-Load Forwarding Side Channel Attacks by Pre-Forwarding Matching of Physical Address Proxies and/or Permission Checking
A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises ensuring that the physical load and store addresses match and/or that permissions are present before speculatively store-to-load forwarding. Various improvements maintain a short load-store pipeline, including usage of a virtual level-one data cache (DL1), usage of an inclusive physical level-two data cache (DL2), storage and lookup of physical data address equivalents in the DL1, and using a memory dependence predictor (MDP) to speed up or replace store queue camming of load data addresses against store data addresses.
METHODS AND APPARATUS FOR ALLOCATION IN A VICTIM CACHE SYSTEM
Methods, apparatus, systems and articles of manufacture are disclosed for allocation in a victim cache system. An example apparatus includes a first cache storage, a second cache storage, a cache controller coupled to the first cache storage and the second cache storage and operable to receive a memory operation that specifies an address, determine, based on the address, that the memory operation evicts a first set of data from the first cache storage, determine that the first set of data is unmodified relative to an extended memory, and cause the first set of data to be stored in the second cache storage.