Patent classifications
G06F12/0824
REGION BASED SPLIT-DIRECTORY SCHEME TO ADAPT TO LARGE CACHE SIZES
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
Neural network processor with direct memory access and hardware acceleration circuits
A dynamically adaptive neural network processing system includes memory to store instructions representing a neural network in contiguous blocks, hardware acceleration (HA) circuitry to execute the neural network, direct memory access (DMA) circuitry to transfer the instructions from the contiguous blocks of the memory to the HA circuitry, and a central processing unit (CPU) to dynamically modify a linked list representing the neural network during execution of the neural network by the HA circuitry to perform machine learning, and to generate the instructions in the contiguous blocks of the memory based on the linked list.
PIPELINE ARBITRATION
A method includes receiving, by a first stage in a pipeline, a first transaction from a previous stage in pipeline; in response to first transaction comprising a high priority transaction, processing high priority transaction by sending high priority transaction to a buffer; receiving a second transaction from previous stage; in response to second transaction comprising a low priority transaction, processing low priority transaction by monitoring a full signal from buffer while sending low priority transaction to buffer; in response to full signal asserted and no high priority transaction being available from previous stage, pausing processing of low priority transaction; in response to full signal asserted and a high priority transaction being available from previous stage, stopping processing of low priority transaction and processing high priority transaction; and in response to full signal being de-asserted, processing low priority transaction by sending low priority transaction to buffer.
CONTROLLER WITH CACHING AND NON-CACHING MODES
An apparatus includes a CPU core, a first cache subsystem coupled to the CPU core, and a second memory coupled to the cache subsystem. The first cache subsystem includes a configuration register, a first memory, and a controller. The controller is configured to: receive a request directed to an address in the second memory and, in response to the configuration register having a first value, operate in a non-caching mode. In the non-caching mode, the controller is configured to provide the request to the second memory without caching data returned by the request in the first memory. In response to the configuration register having a second value, the controller is configured to operate in a caching mode. In the caching mode the controller is configured to provide the request to the second memory and cache data returned by the request in the first memory.
MERGING DATA FOR WRITE ALLOCATE
A method includes receiving, by a level two (L2) controller, a write request for an address that is not allocated as a cache line in a L2 cache. The write request specifies write data. The method also includes generating, by the L2 controller, a read request for the address; reserving, by the L2 controller, an entry in a register file for read data returned in response to the read request; updating, by the L2 controller, a data field of the entry with the write data; updating, by the L2 controller, an enable field of the entry associated with the write data; and receiving, by the L2 controller, the read data and merging the read data into the data field of the entry.
GLOBAL COHERENCE OPERATIONS
A method includes receiving, by a L2 controller, a request to perform a global operation on a L2 cache and preventing new blocking transactions from entering a pipeline coupled to the L2 cache while permitting new non-blocking transactions to enter the pipeline. Blocking transactions include read transactions and non-victim write transactions. Non-blocking transactions include response transactions, snoop transactions, and victim transactions. The method further includes, in response to an indication that the pipeline does not contain any pending blocking transactions, preventing new snoop transactions from entering the pipeline while permitting new response transactions and victim transactions to enter the pipeline; in response to an indication that the pipeline does not contain any pending snoop transactions, preventing, all new transactions from entering the pipeline; and, in response to an indication that the pipeline does not contain any pending transactions, performing the global operation on the L2 cache.
PSEUDO-RANDOM WAY SELECTION
A method includes receiving a first request to allocate a line in an N-way set associative cache and, in response to a cache coherence state of a way indicating that a cache line stored in the way is invalid, allocating the way for the first request. The method also includes, in response to no ways in the set having a cache coherence state indicating that the cache line stored in the way is invalid, randomly selecting one of the ways in the set. The method also includes, in response to a cache coherence state of the selected way indicating that another request is not pending for the selected way, allocating the selected way for the first request.
SYSTEMS AND METHODS FOR IMPLEMENTING COHERENT MEMORY IN A MULTIPROCESSOR SYSTEM
Data units are stored in private caches in nodes of a multiprocessor system, each node containing at least one processor (CPU), at least one cache private to the node and at least one cache locations buffer {CLB} private to the node. In each CLB location information values are stored, each location information value indicating a location associated with a respective data unit, wherein each location information value stored in a given CLB indicates the location to be either a location within the private cache disposed in the same node as the given CLB, to be a location in one of the other nodes, or to be a location in a main memory. Coherence of values of the data units is maintained using a cache coherence protocol The location information values stored in the CLBs are updated by the cache coherence protocol in accordance with movements of their respective data units.
TERNARY CONTENT ADDRESSABLE MEMORY-ENHANCED CACHE COHERENCY ACCELERATION
A system and method for cache coherency within multiprocessor environments is provided. Each node controller of a plurality of nodes within a multiprocessor system receives a cache coherency protocol request from local processor sockets and other node controller(s). A ternary content addressable memory (TCAM) accelerator in the node controller determines if the cache coherency protocol request comprises a snoop request and, if it is determined to be a snoop request, searching the TCAM based on an address within the cache coherency protocol request. In response to detecting only one match between an entry of the TCAM and the received snoop request, sending a response to the requesting local processor a response without having to access a coherency directory.
Tracking transactions using extended memory features
An approach is disclosed that tracks memory transactions by a node. The approach establishes a transaction processing state corresponding to common virtual addresses accessed by a processing threads. Transactions are executed by the threads. A selected transaction is allowed to complete. In response to detecting a conflict in the transaction processing state, completion of a non-selected transaction is inhibited.