Patent classifications
G06F2212/622
Apparatus and method for buffered interconnect
There is provided an interconnect for transferring requests between ports in which the ports include both source ports destination ports. The interconnect includes storage circuitry for storing the requests. Input circuitry receives the requests from the plurality of source ports, selects at least one selected source port from an allowed set of said plurality of source ports, and transfers a presented request from the at least one selected source port to the storage circuitry. Output circuitry causes a request in said storage circuitry to be output at one of said plurality of destination ports. Counter circuitry maintains counter values for a plurality of tracked ports from amongst said ports, each counter value indicating the number of requests in said storage circuitry associated with a corresponding tracked port that are waiting to be output by said output circuitry and filter circuitry determines whether or not a given source port is in said allowed set in dependence on said counter circuitry.
Region based split-directory scheme to adapt to large cache sizes
Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
FINAL CACHE DIRECTORY STATE INDICATION
A method for managing designated authority status in a cache line includes identifying an initial designated authority (DA) member cache for a cache line, transferring DA status from the initial DA member cache to a new DA member cache, determining whether the new DA member cache is active, indicating a final state of the initial DA cache responsive to determining that the new DA member cache is active, and overriding a DA state in a cache control structure in a directory. A method for managing cache accesses during a designated authority transfer includes receiving a designated authority (DA) status transfer request, receiving an indication that a first cache will invalidate its copy of the cache line, allowing a second cache to assume DA status for the cache line, and denying access to the first cache’s copy of the cache line until invalidation by the first cache is complete.
NEURAL PROCESSING DEVICE
A neural processing device is provided. The neural processing device comprises: a processing unit configured to perform calculations, an L0 memory configured to receive data from the processing unit and provide data to the processing unit, and an LSU (Load/Store Unit) configured to perform load and store operations of the data, wherein the LSU comprises: a neural core load unit configured to issue a load instruction of the data, a neural core store unit configured to issue a store instruction for transmitting and storing the data, and a sync ID logic configured to provide a sync ID to the neural core load unit and the neural core store unit to thereby cause a synchronization signal to be generated for each sync ID.
Neural processing device
A neural processing device is provided. The neural processing device comprises: a processing unit configured to perform calculations, an L0 memory configured to receive data from the processing unit and provide data to the processing unit, and an LSU (Load/Store Unit) configured to perform load and store operations of the data, wherein the LSU comprises: a neural core load unit configured to issue a load instruction of the data, a neural core store unit configured to issue a store instruction for transmitting and storing the data, and a sync ID logic configured to provide a sync ID to the neural core load unit and the neural core store unit to thereby cause a synchronization signal to be generated for each sync ID.
IMPLIED DIRECTORY STATE UPDATES
A request is received over a link that requests a particular line in memory. A directory state record is identified in memory that identifies a directory state of the particular line. A type of the request is identified from the request. It is determined that the directory state of the particular line is to change from the particular state to a new state based on the directory state of the particular line and the type of the request. The directory state record is changed, in response to receipt of the request, to reflect the new state. A copy of the particular line is sent in response to the request
On-demand memory allocation
Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
NEURAL PROCESSING DEVICE
A neural processing device is provided. The neural processing device comprises: a processing unit configured to perform calculations, an L0 memory configured to receive data from the processing unit and provide data to the processing unit, and an LSU (Load/Store Unit) configured to perform load and store operations of the data, wherein the LSU comprises: a neural core load unit configured to issue a load instruction of the data, a neural core store unit configured to issue a store instruction for transmitting and storing the data, and a sync ID logic configured to provide a sync ID to the neural core load unit and the neural core store unit to thereby cause a synchronization signal to be generated for each sync ID.
COHERENCE-BASED CACHE-LINE COPY-ON-WRITE
A method of performing a copy-on-write on a shared memory page is carried out by a device communicating with a processor via a coherence interconnect. The method includes: adding a page table entry so that a request to read a first cache line of the shared memory page includes a cache-line address of the shared memory page and a request to write to a second cache line of the shared memory page includes a cache-line address of a new memory page; in response to the request to write to the second cache line, storing new data of the second cache line in a second memory and associating the second cache-line address with the new data stored in the second memory; and in response to a request to read the second cache line, reading the new data of the second cache line from the second memory.
Multiple cache framework for managing data for scenario planning
The embodiments disclosed herein relate to computing a transportation plan for transporting goods from one place to another across a number of shipments and that satisfy multiple shipment orders. The transportation plan may specify a transportation channel that includes one or more segments selected from service provider rate offerings that may include a means of transportation, starting location, destination location, and cost of the segment. An actionable transportation plan may be computed based on current transportation planning data. Alternative plans may be computed for a variety of scenarios in which hypothetical changes are introduced to the transportation planning data. Any combination of an actionable transportation plan and alternative plans may be computed concurrently with computations sharing a common cache of production data.