Patent classifications
G06F12/1063
Memory management unit with address translation cache
The present disclosure advantageously provides a memory management unit and methods for invalidating cache lines in address translation caches. The memory management unit has one or more address translation caches, and each address translation cache has a plurality of cache lines. The memory management unit receives transactions from a source of transactions. The transactions include, inter alia, memory transactions and a set-aside translation transaction. The memory transactions include at least a first memory transaction and a last memory transaction, and each memory transaction includes the same virtual memory address and the same translation context identifier. The set-aside translation transaction also includes the same virtual memory address and the same translation context identifier. In response to receiving the set-aside translation transaction, the memory management unit deallocates each cache line that stores an address translation for the same virtual memory address and the same translation context identifier.
TECHNOLOGIES FOR DYNAMIC ACCELERATOR SELECTION
Technologies for dynamic accelerator selection include a compute sled. The compute sled includes a network interface controller to communicate with a remote accelerator of an accelerator sled over a network, where the network interface controller includes a local accelerator and a compute engine. The compute engine is to obtain network telemetry data indicative of a level of bandwidth saturation of the network. The compute engine is also to determine whether to accelerate a function managed by the compute sled. The compute engine is further to determine, in response to a determination to accelerate the function, whether to offload the function to the remote accelerator of the accelerator sled based on the telemetry data. Also the compute engine is to assign, in response a determination not to offload the function to the remote accelerator, the function to the local accelerator of the network interface controller.
METHOD AND SYSTEM FOR FACILITATING LOSSY DROPPING AND ECN MARKING
Methods and systems are provided for performing lossy dropping and ECN marking in a flow-based network. The system can maintain state information of individual packet flows, which can be set up or released dynamically based on injected data. Each flow can be provided with a flow-specific input queue upon arriving at a switch. Packets of a respective flow are acknowledged after reaching the egress point of the network, and the acknowledgement packets are sent back to the ingress point of the flow along the same data path. As a result, each switch can obtain state information of each flow and perform per-flow packet dropping and ECN marking.
Multi-tile memory management mechanism
Graphics processors for implementing multi-tile memory management are disclosed. In one embodiment, a graphics processor includes a first graphics device having a local memory, a second graphics device having a local memory, and a graphics driver to provide a single virtual allocation with a common virtual address range to mirror a resource to each local memory of the first and second graphics devices.
GENERATIONAL PHYSICAL ADDRESS PROXIES
Each PIPT L2 cache entry is uniquely identified by a set index and a way and holds a generational identifier (GENID). The L2 detects a miss of a physical memory line address (PMLA). An L2 set index is obtained from the PMLA. The L2 picks a way for replacement, increments the GENID held in the entry in the picked way of the selected set, and forms a physical address proxy (PAP) for the PMLA with the obtained set index and the picked way. The PAP uniquely identifies the picked L2 entry. The L2 forms a generational PAP (GPAP) for the PMLA with the PAP and the incremented GENID. A load/store unit makes available the GPAP as a proxy of the PMLA for comparisons with GPAPs of other PMLAs, rather than making comparisons of the PMLA itself with the other PMLAs, to determine whether the PMLA matches the other PMLAs.
VIRTUALLY-INDEXED CACHE COHERENCY USING PHYSICAL ADDRESS PROXIES
A cache memory subsystem includes virtually-indexed L1 and PIPT L2 set-associative caches having an inclusive allocation policy such that: when a first copy of a memory line specified by a physical memory line address (PMLA) is allocated into an L1 entry, a second copy of the line is also allocated into an L2 entry; when the second copy is evicted, the first copy is also evicted. For each value of the PMLA, the second copy can be allocated into only one L2 set, and an associated physical address proxy (PAP) for the PMLA includes a set index and way number that uniquely identifies the entry. For each value of the PMLA there exist two or more different L1 sets into which the first copy can be allocated, and when the L2 evicts the second copy, the L1 uses the PAP of the PMLA to evict the first copy.
PHYSICAL ADDRESS PROXY REUSE MANAGEMENT
Each load/store queue entry holds a load/store physical address proxy (PAP) for use as a proxy for a load/store physical memory line address (PMLA). The load/store PAP comprises a set index and a way that uniquely identifies an L2 cache entry holding a memory line at the load/store PMLA when an L1 cache provides the load/store PAP during the load/store instruction execution. The microprocessor removes a line at a removal PMLA from an L2 entry, forms a removal PAP as a proxy for the removal PMLA that comprises a set index and a way, snoops the load/store queue with the removal PAP to determine whether the removal PAP is being used as a proxy for the removal PMLA, fills the removed entry with a line at a fill PMLA, and prevents the removal PAP from being used as a proxy for the removal PMLA and the fill PMLA concurrently.
MICROPROCESSOR THAT PREVENTS SAME ADDRESS LOAD-LOAD ORDERING VIOLATIONS USING PHYSICAL ADDRESS PROXIES
A L2 set associative cache that is inclusive of an L1 cache. Each entry of a load queue holds a load physical address proxy (PAP) for a load physical memory line address (PMLA) rather than the load PMLA itself. The load PAP comprises the set index and the way that uniquely identifies the L2 entry that holds a memory line specified by the load PMLA. Each load queue entry indicates whether the load instruction has completed execution. The microprocessor removes a memory line at a removal PMLA from an L2 entry and forms a removal PAP as a proxy for the removal PMLA. The removal PAP comprises a set index and a way that uniquely identifies the removed entry. The microprocessor snoops the load queue with the removal PAP to determine whether the removal PAP matches one or more load PAPs in one or more load queue entries associated with one or more load instructions that have completed execution and, if so, signals an abort request.
PHYSICAL ADDRESS PROXY (PAP) RESIDENCY DETERMINATION FOR REDUCTION OF PAP REUSE
A L2 cache is set associative, has N ways, and is inclusive of a virtual L1 cache such that when the virtual address misses in the L1: a portion of the virtual address is translated into a physical memory line address (PMLA), the PMLA is allocated into an L2 entry, and a physical address proxy (PAP) for the PMLA is allocated into an L1 entry. The PAP for the PMLA includes a set and a way that uniquely identify the L2 entry. The L2 receives a physical memory line address for allocation, uses a set index portion of the PMLA, and for each L2 way, forms a PAP corresponding to the way. The L1, for each PAP, generates a corresponding indicator of whether the PAP is L1 resident. The L2 selects, for replacement, a way whose indicator indicates the PAP is not resident in the L1.
UNFORWARDABLE LOAD INSTRUCTION RE-EXECUTION ELIGIBILITY BASED ON CACHE UPDATE BY IDENTIFIED STORE INSTRUCTION
A microprocessor includes a cache memory, a store queue, and a load/store unit. Each entry of the store queue holds store data associated with a store instruction. The load/store unit, during execution of a load instruction, makes a determination that an entry of the store queue holds store data that includes some but not all bytes of load data requested by the load instruction, cancels execution of the load instruction in response to the determination, and writes to an entry of a structure from which the load instruction is subsequently issuable for re-execution an identifier of a store instruction that is older in program order than the load instruction and an indication that the load instruction is not eligible to re-execute until the identified older store instruction updates the cache memory with store data.