Patent classifications
G06F12/0817
Lock-free work-stealing thread scheduler
Systems and methods are provided for lock-free thread scheduling. Threads may be placed in a ring buffer shared by all computer processing units (CPUs), e.g., in a node. A thread assigned to a CPU may be placed in the CPU's local run queue. However, when a CPU's local run queue is cleared, that CPU checks the shared ring buffer to determine if any threads are waiting to run on that CPU, and if so, the CPU pulls a batch of threads related to that ready-to-run thread to execute. If not, an idle CPU randomly selects another CPU to steal threads from, and the idle CPU attempts to dequeue a thread batch associated with the CPU from the shared ring buffer. Polling may be handled through the use of a shared poller array to dynamically distribute polling across multiple CPUs.
Method for PRP/SGL handling for out-of-order NVME controllers
Read latency for a read operation to a host implementing a PRP/SGL buffer is reduced by generating an address table representing the linked-list structure defining the PRP/SGL buffer. The address table may be generated concurrently with reading of data referenced by the read command from a NAND storage device. A block table for tracking status of LBAs referenced by IO commands may include a reference to the address table which is used to transfer LBAs to host memory as soon as the address table is complete and a block of data referenced by an LBA has been read from the NAND storage device.
Victim cache that supports draining write-miss entries
A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes a set of cache lines, line type bits configured to store an indication that a corresponding cache line of the set of cache lines is configured to store write-miss data, and an eviction controller configured to flush stored write-miss data based on the line type bits.
Variable protection window extension for a target address of a store-conditional request
A processing unit includes a processor core and an associated cache memory. The cache memory establishes a reservation of a hardware thread of the processor core for a store target address and services a store-conditional request of the processor core by conditionally updating the shared memory with store data based on the whether the hardware thread has a reservation for the store target address. The cache memory receives a hint associated with the store-conditional request indicating an intent of the store-conditional request. The cache memory protects the store target address against access by any conflicting memory access request during a protection window extension following servicing of the store-conditional request. The cache memory establishes a first duration for the protection window extension based on the hint having a first value and establishes a different second duration for the protection window extension based on the hint having a different second value.
Variable protection window extension for a target address of a store-conditional request
A processing unit includes a processor core and an associated cache memory. The cache memory establishes a reservation of a hardware thread of the processor core for a store target address and services a store-conditional request of the processor core by conditionally updating the shared memory with store data based on the whether the hardware thread has a reservation for the store target address. The cache memory receives a hint associated with the store-conditional request indicating an intent of the store-conditional request. The cache memory protects the store target address against access by any conflicting memory access request during a protection window extension following servicing of the store-conditional request. The cache memory establishes a first duration for the protection window extension based on the hint having a first value and establishes a different second duration for the protection window extension based on the hint having a different second value.
MEMORY INCLUSIVITY MANAGEMENT IN COMPUTING SYSTEMS
Techniques of memory inclusivity management are disclosed herein. One example technique includes receiving a request from a core of the CPU to write a block of data corresponding to a first cacheline to a swap buffer at a memory. In response to the request, the method can include retrieving metadata corresponding to the first cacheline that includes a bit encoding a status value indicating whether the memory block at the memory currently contains data of the first cacheline or data corresponding to a second cacheline. The first and second cachelines alternately sharing the swap buffer at the memory. When the decoded status value indicates that the memory block at the first memory currently contains the data corresponding to the first cacheline, an instruction is transmitted to the memory controller to directly write the block of data to the memory block at the first memory.
METHOD AND SYSTEM FOR TRACKING STATE OF CACHE LINES
The state of cache lines transferred into an out of caches of processing hardware is tracked by monitoring hardware. The method of tracking includes monitoring the processing hardware for cache coherence events on a coherence interconnect between the processing hardware and monitoring hardware, determining that the state of a cache line has changed, and updating a hierarchical data structure to indicate the change in the state of said cache line. The hierarchical data structure includes a first level data structure including first bits, and a second level data structure including second bits, each of the first bits associated with a group of second bits. The step of updating includes setting one of the first bits and one of the second bits in the group corresponding to the first bit that is being set, according to an address of said cache line.
Marking in-flight requests affected by translation entry invalidation in a data processing system
A memory-referent instruction is executed to calculate a target effective address (EA) of a corresponding memory-referent request. An array entry in an upper level cache is allocated, and the EA is specified in a corresponding EA directory entry. While in-flight, the memory-referent request is buffered in a queue in association with a pointer to the entry in the EA directory. Based on receiving a translation invalidation request requesting invalidation of an address translation in a translation structure, the processor core walks the EA directory, determines the EA in the entry matches an address range specified by the translation invalidation request, and, based on the match, precisely marks the memory-referent request using the pointer to the EA directory entry. Based on the marking, the translation invalidation request is permitted to complete with reference to the processor core only after the memory-referent request has drained from the processing unit.
Marking in-flight requests affected by translation entry invalidation in a data processing system
A memory-referent instruction is executed to calculate a target effective address (EA) of a corresponding memory-referent request. An array entry in an upper level cache is allocated, and the EA is specified in a corresponding EA directory entry. While in-flight, the memory-referent request is buffered in a queue in association with a pointer to the entry in the EA directory. Based on receiving a translation invalidation request requesting invalidation of an address translation in a translation structure, the processor core walks the EA directory, determines the EA in the entry matches an address range specified by the translation invalidation request, and, based on the match, precisely marks the memory-referent request using the pointer to the EA directory entry. Based on the marking, the translation invalidation request is permitted to complete with reference to the processor core only after the memory-referent request has drained from the processing unit.
VARIABLE PROTECTION WINDOW EXTENSION FOR A TARGET ADDRESS OF A STORE-CONDITIONAL REQUEST
A processing unit includes a processor core and an associated cache memory. The cache memory establishes a reservation of a hardware thread of the processor core for a store target address and services a store-conditional request of the processor core by conditionally updating the shared memory with store data based on the whether the hardware thread has a reservation for the store target address. The cache memory receives a hint associated with the store-conditional request indicating an intent of the store-conditional request. The cache memory protects the store target address against access by any conflicting memory access request during a protection window extension following servicing of the store-conditional request. The cache memory establishes a first duration for the protection window extension based on the hint having a first value and establishes a different second duration for the protection window extension based on the hint having a different second value.