G06F12/1063

PHYSICAL ADDRESS PROXIES TO ACCOMPLISH PENALTY-LESS PROCESSING OF LOAD/STORE INSTRUCTIONS WHOSE DATA STRADDLES CACHE LINE ADDRESS BOUNDARIES
20220358045 · 2022-11-10 ·

A microprocessor includes a physically-indexed physically-tagged second-level set-associative cache. A set index and a way uniquely identifies each entry. A load/store unit, during store/load instruction execution: detects that a first and second portions of store/load data are to be written/read to/from different first and second lines of memory specified by first and second store physical memory line addresses, writes to a store/load queue entry first and second store physical address proxies (PAPs) for first and second store physical memory line addresses (and all the store data in store execution case). The first and second store PAPs comprise respective set indexes and ways that uniquely identifies respective entries of the second-level cache that holds respective copies of the respective first and second lines of memory. The entries of the store queue are absent storage for holding the first and second store physical memory line addresses.

STORE-TO-LOAD FORWARDING CORRECTNESS CHECKS USING PHYSICAL ADDRESS PROXIES STORED IN LOAD QUEUE ENTRIES
20220358044 · 2022-11-10 ·

A microprocessor includes a load/store unit that performs store-to-load forwarding, a PIPT L2 set-associative cache, a store queue having store entries, and a load queue having load entries. Each L2 entry is uniquely identified by a set index and a way. Each store/load entry holds, for an associated store/load instruction, a store/load physical address proxy (PAP) for a store/load physical memory line address (PMLA). The store/load PAP specifies the set index and the way of the L2 entry into which a cache line specified by the store/load PMLA is allocated. Each load entry also holds associated load instruction store-to-load forwarding information. The load/store unit compares the store PAP with the load PAP of each valid load entry whose associated load instruction is younger in program order than the store instruction and uses the comparison and associated forwarding information to check store-to-load forwarding correctness with respect to each younger load instruction.

USING PHYSICAL ADDRESS PROXIES TO HANDLE SYNONYMS WHEN WRITING STORE DATA TO A VIRTUALLY-INDEXED CACHE

A microprocessor includes a virtually-indexed L1 data cache that has an allocation policy that permits multiple synonyms to be co-resident. Each L2 entry is uniquely identified by a set index and a way number. A store unit, during a store instruction execution, receives a store physical address proxy (PAP) for a store physical memory line address (PMLA) from an L1 entry hit upon by a store virtual address, and writes the store PAP to a store queue entry. The store PAP comprises the set index and the way number of an L2 entry that holds a line specified by the store PMLA. The store unit, during the store commit, reads the store PAP from the store queue, looks up the store PAP in the L1 to detect synonyms, writes the store data to one or more of the detected synonyms, and evicts the non-written detected synonyms.

MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS
20220358047 · 2022-11-10 ·

A microprocessor prevents same address load-load ordering violations. Each load queue entry holds a load physical memory line address (PMLA) and an indication of whether a load instruction has completed execution. The microprocessor fills a line specified by a fill PMLA into a cache entry and snoops the load queue with the fill PMLA, either before the fill or in an atomic manner with the fill with respect to ability of the filled entry to be hit upon by any load instruction, to determine whether the fill PMLA matches load PMLAs in load queue entries associated with load instructions that have completed execution and there are other load instructions in the load queue that have not completed execution. The microprocessor, if the condition is true, flushes at least the other load instructions in the load queue that have not completed execution.

Thwarting Store-to-Load Forwarding Side Channel Attacks by Pre-Forwarding Matching of Physical Address Proxies and/or Permission Checking
20220358209 · 2022-11-10 ·

A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises ensuring that the physical load and store addresses match and/or that permissions are present before speculatively store-to-load forwarding. Various improvements maintain a short load-store pipeline, including usage of a virtual level-one data cache (DL1), usage of an inclusive physical level-two data cache (DL2), storage and lookup of physical data address equivalents in the DL1, and using a memory dependence predictor (MDP) to speed up or replace store queue camming of load data addresses against store data addresses.

TRANSLATION HINTS
20230102006 · 2023-03-30 ·

A hinter data processing apparatus is provided with processing circuitry that determines that an execution context to be executed on a hintee data processing apparatus will require a virtual-to-physical address translation. Hint circuitry transmits a hint to a hintee data processing apparatus to prefetch a virtual-to-physical address translation in respect of an execution context of the further data processing apparatus. A hintee data processing apparatus is also provided with receiving circuitry that receives a hint from a hinter data processing apparatus to prefetch a virtual-to-physical address translation in respect of an execution context of the further data processing apparatus. Processing circuitry determines whether to follow the hint and, in response to determining that the hint is to be followed, causes the virtual-to-physical address translation to be prefetched for the execution context of the data processing apparatus. In both cases, the hint comprises an identifier of the execution context.

Systems and methods for implementing coherent memory in a multiprocessor system

Data units are stored in private caches in nodes of a multiprocessor system, each node containing at least one processor (CPU), at least one cache private to the node and at least one cache location buffer (CLB) private to the node. In each CLB location information values are stored, each location information value indicating a location associated with a respective data unit, wherein each location information value stored in a given CLB indicates the location to be either a location within the private cache disposed in the same node as the given CLB, to be a location in one of the other nodes, or to be a location in a main memory. Coherence of values of the data units is maintained using a cache coherence protocol. The location information values stored in the CLBs are updated by the cache coherence protocol in accordance with movements of their respective data units.

DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING ADDRESS TRANSLATION

There is provided a data processing apparatus and method of data processing. The data processing apparatus comprises storage circuitry to store a hierarchy of page tables comprising an intermediate level page table. Each entry of the intermediate level page table comprises base address information of a next level page table and control information indicating whether an addressing function has been applied to reorder physical storage locations of entries of the next level page table. Address translation circuitry is provided to perform address translations in response to receipt of a virtual address by performing a lookup in a next level page table dependent on the base address information and a page table index from the virtual address. When the control information indicates that the addressing function has been applied, the lookup is performed at a modified storage location generated by applying the addressing function to the page table index.

NETWORK ENTITIES AND METHODS PERFORMED THEREIN FOR HANDLING CACHE COHERENCY

A method performed by a coordinating entity in a disaggregated data center architecture wherein computing resources are separated in discrete resource pools and associated together to represent a functional server. The coordinating entity obtains a setup of processor cores that are coupled logically as the functional server, and determines an index indicating an identity of a cache coherency domain based on the obtained setup of processor cores. The coordinating entity further configures one or more communicating entities associated with the obtained setup of processor cores, to use the determined index when handling updated cache related data.

Hashing with Soft Memory Folding

In an embodiment, a system may support programmable hashing of address bits at a plurality of levels of granularity to map memory addresses to memory controllers and ultimately at least to memory devices. The hashing may be programmed to distribute pages of memory across the memory controllers, and consecutive blocks of the page may be mapped to physically distant memory controllers. In an embodiment, address bits may be dropped from each level of granularity, forming a compacted pipe address to save power within the memory controller. In an embodiment, a memory folding scheme may be employed to reduce the number of active memory devices and/or memory controllers in the system when the full complement of memory is not needed.