G06F12/0828

Pipeline arbitration

A method includes receiving, by a first stage in a pipeline, a first transaction from a previous stage in pipeline; in response to first transaction comprising a high priority transaction, processing high priority transaction by sending high priority transaction to a buffer; receiving a second transaction from previous stage; in response to second transaction comprising a low priority transaction, processing low priority transaction by monitoring a full signal from buffer while sending low priority transaction to buffer; in response to full signal asserted and no high priority transaction being available from previous stage, pausing processing of low priority transaction; in response to full signal asserted and a high priority transaction being available from previous stage, stopping processing of low priority transaction and processing high priority transaction; and in response to full signal being de-asserted, processing low priority transaction by sending low priority transaction to buffer.

CONFIGURABLE CACHE FOR MULTI-ENDPOINT HETEROGENEOUS COHERENT SYSTEM
20220229779 · 2022-07-21 ·

A device includes a memory bank. The memory bank includes data portions of a first way group. The data portions of the first way group include a data portion of a first way of the first way group and a data portion of a second way of the first way group. The memory bank further includes data portions of a second way group. The device further includes a configuration register and a controller configured to individually allocate, based on one or more settings in the configuration register, the first way and the second way to one of an addressable memory space and a data cache.

TAG UPDATE BUS FOR UPDATED COHERENCE STATE

An apparatus includes a CPU core and a L1 cache subsystem including a L1 main cache, a L1 victim cache, and a L1 controller. The apparatus includes a L2 cache subsystem coupled to the L1 cache subsystem by a transaction bus and a tag update bus. The L2 cache subsystem includes a L2 main cache, a shadow L1 main cache, a shadow L1 victim cache, and a L2 controller. The L2 controller receives a message from the L1 controller over the tag update bus, including a valid signal, an address, and a coherence state. In response to the valid signal being asserted, the L2 controller identifies an entry in the shadow L1 main cache or the shadow L1 victim cache having an address corresponding to the address of the message and updates a coherence state of the identified entry to be the coherence state of the message.

GLOBAL COHERENCE OPERATIONS

A method includes receiving, by a L2 controller, a request to perform a global operation on a L2 cache and preventing new blocking transactions from entering a pipeline coupled to the L2 cache while permitting new non-blocking transactions to enter the pipeline. Blocking transactions include read transactions and non-victim write transactions. Non-blocking transactions include response transactions, snoop transactions, and victim transactions. The method further includes, in response to an indication that the pipeline does not contain any pending blocking transactions, preventing new snoop transactions from entering the pipeline while permitting new response transactions and victim transactions to enter the pipeline; in response to an indication that the pipeline does not contain any pending snoop transactions, preventing, all new transactions from entering the pipeline; and, in response to an indication that the pipeline does not contain any pending transactions, performing the global operation on the L2 cache.

CACHE SIZE CHANGE

A method includes determining, by a level one (L1) controller, to change a size of a L1 main cache; servicing, by the L1 controller, pending read requests and pending write requests from a central processing unit (CPU) core; stalling, by the L1 controller, new read requests and new write requests from the CPU core; writing back and invalidating, by the L1 controller, the L1 main cache. The method also includes receiving, by a level two (L2) controller, an indication that the L1 main cache has been invalidated and, in response, flushing a pipeline of the L2 controller; in response to the pipeline being flushed, stalling, by the L2 controller, requests received from any master; reinitializing, by the L2 controller, a shadow L1 main cache. Reinitializing includes clearing previous contents of the shadow L1 main cache and changing the size of the shadow L1 main cache.

System and methods for reducing global coherence unit snoop filter lookup via local memories

In a multi-node system, each node includes tiles. Each tile includes a cache controller, a local cache, and a snoop filter cache (SFC). The cache controller responsive to a memory access request by the tile checks the local cache to determine whether the data associated with the request has been cached by the local cache of the tile. The cached data from the local cache is returned responsive to a cache-hit. The SFC is checked to determine whether any other tile of a remote node has cached the data associated with the memory access request. If it is determined that the data has been cached by another tile of a remote node and if there is a cache-miss by the local cache, then the memory access request is transmitted to the global coherency unit (GCU) and the snoop filter to fetch the cached data. Otherwise an interconnected memory is accessed.

Server recovery from a change in storage control chip

A method comprises configuring an address-to-SC unit (A2SU) of each of a plurality of CPU chips based on a number of valid SC chips in the computer system. Each of the plurality of CPU chips is coupled to each of the SC chips in a leaf-spine topology. The A2SU is configured to correlate each of a plurality of memory addresses with a respective one of the valid SC chips. The method further comprises, in response to detecting a change in the number of valid SC chips, pausing operation of the computer system including operation of a cache of each of the plurality of CPU chips; while operation of the computer system is paused, reconfiguring the A2SU in each of the plurality of CPU chips based on the change in the number of valid SC chips; and in response to reconfiguring the A2SU, resuming operation of the computer system.

Circuitry and method

Circuitry comprises a set of data handling nodes comprising: two or more master nodes each having respective storage circuitry to hold copies of data items from a main memory, each copy of a data item being associated with indicator information to indicate a coherency state of the respective copy, the indicator information being configured to indicate at least whether that copy has been updated more recently than the data item held by the main memory; a home node to serialise data access operations and to control coherency amongst data items held by the set of data handling nodes so that data written to a memory address is consistent with data read from that memory address in response to a subsequent access request; and one or more slave nodes including the main memory; in which: a requesting node of the set of data handling nodes is configured to communicate a conditional request to a target node of the set of data handling nodes in respect of a copy of a given data item at a given memory address, the conditional request being associated with an execution condition and being a request that the copy of the given data item is written to a destination node of the data handling nodes; and the target node is configured, in response to the conditional request: (i) when the outcome of the execution condition is successful, to write the data item to the destination node and to communicate a completion-success indicator to the requesting node; and (ii) when the outcome of the execution condition is a failure, to communicate a completion-failure indicator to the requesting node.

MULTICORE SHARED CACHE OPERATION ENGINE

Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.

DELAYED SNOOP FOR IMPROVED MULTI-PROCESS FALSE SHARING PARALLEL THREAD PERFORMANCE
20220156193 · 2022-05-19 ·

Techniques for maintaining cache coherency comprising storing data blocks associated with a main process in a cache line of a main cache memory, storing a first local copy of the data blocks in a first local cache memory of a first processor, storing a second local copy of the set of data blocks in a second local cache memory of a second processor executing a first child process of the main process to generate first output data, writing the first output data to the first data block of the first local copy as a write through, writing the first output data to the first data block of the main cache memory as a part of the write through, transmitting an invalidate request to the second local cache memory, marking the second local copy of the set of data blocks as delayed, and transmitting an acknowledgment to the invalidate request.