G06F12/0826

Mechanism for mitigating information leak via cache side channels during speculative execution
11231931 · 2022-01-25 · ·

A processor includes a first core and a second core to execute computer instructions. Each of the cores includes its own private memory cache and speculative load queue. The speculative load queue stores cachelines for the computer instructions and data when the core is operating in a speculative state with respect to a process or thread. The processor includes a state tracking buffer having a state field to store a speculative exclusive ownership state for each cacheline in the speculative load queue when present therein.

Home agent based cache transfer acceleration scheme

Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.

REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES

Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

Circuitry and method

Circuitry comprises a set of data handling nodes comprising: two or more master nodes each having respective storage circuitry to hold copies of data items from a main memory, each copy of a data item being associated with indicator information to indicate a coherency state of the respective copy, the indicator information being configured to indicate at least whether that copy has been updated more recently than the data item held by the main memory; a home node to serialise data access operations and to control coherency amongst data items held by the set of data handling nodes so that data written to a memory address is consistent with data read from that memory address in response to a subsequent access request; and one or more slave nodes including the main memory; in which: a requesting node of the set of data handling nodes is configured to communicate a conditional request to a target node of the set of data handling nodes in respect of a copy of a given data item at a given memory address, the conditional request being associated with an execution condition and being a request that the copy of the given data item is written to a destination node of the data handling nodes; and the target node is configured, in response to the conditional request: (i) when the outcome of the execution condition is successful, to write the data item to the destination node and to communicate a completion-success indicator to the requesting node; and (ii) when the outcome of the execution condition is a failure, to communicate a completion-failure indicator to the requesting node.

MAINTAINING DOMAIN COHERENCE STATES INCLUDING DOMAIN STATE NO-OWNED (DSN) IN PROCESSOR-BASED DEVICES

Maintaining domain coherence states including Domain State No-Owned (DSN) in processor-based devices is disclosed. In this regard, a processor-based device provides multiple processing elements (PEs) organized into multiple domains, each containing one or more PEs and a local ordering point circuit (LOP). The processor-based device supports domain coherence states for coherence granules cached by the PEs within a given domain. The domain coherence states include a DSN domain coherence state, which indicates that a coherence granule is not cached within a shared modified state within any domain. In some embodiments, upon receiving a request for a read access to a coherence granule, a system ordering point circuit (SOP) determines that the coherence granule is cached in the DSN domain coherence state within a domain of the plurality of domains, and can safely read the coherence granule from the system memory to satisfy the read access if necessary.

SCALABLE REGION-BASED DIRECTORY
20220100672 · 2022-03-31 ·

Disclosed is a cache directory including one or more cache directories configurable to interchange within each cache directory entry at least one bit between a first field and a second field to change the size of the region of memory represented and the number of cache lines tracked in the cache subsystem.

ALLOCATION OF MEMORY WITHIN A DATA TYPE-SPECIFIC MEMORY HEAP
20220075712 · 2022-03-10 ·

One embodiment provides for a non-transitory machine-readable medium storing instructions to cause one or more processors to perform operations comprising receiving an instruction to dynamically allocate memory for an object of a data type and dynamically allocating memory for the object from a heap instance that is specific to the data type for the object, the heap instance including a memory allocator for the data type, the memory allocator generated at compile time for the instruction based on a specification of the data type for the heap instance.

Programmable cache coherent node controller

A computer system includes a first group of CPU modules operatively coupled to at least one first Programmable ASIC Node Controller configured to execute transactions directly or through a first interconnect switch to at least one second Programmable ASIC Node Controller connected to a second group of CPU modules running a single instance of an operating system.

Set table of contents (TOC) register instruction

A Set Table of Contents (TOC) Register instruction. An instruction to provide a pointer to a reference data structure, such as a TOC, is obtained by a processor and executed. The executing includes determining a value for the pointer to the reference data structure, and storing the value in a location (e.g., a register) specified by the instruction.

Allocation of memory within a data type-specific memory heap
11182283 · 2021-11-23 · ·

One embodiment provides for a non-transitory machine-readable medium storing instructions to cause one or more processors to perform operations comprising receiving an instruction to dynamically allocate memory for an object of a data type and dynamically allocating memory for the object from a heap instance that is specific to the data type for the object, the heap instance including a memory allocator for the data type, the memory allocator generated at compile time for the instruction based on a specification of the data type for the heap instance.