G06F12/0833

Method and apparatus for smart store operations with conditional ownership requests

Method and apparatus implementing smart store operations with conditional ownership requests. One aspect includes a method implemented in a multi-core processor, the method comprises: receiving a conditional read for ownership (CondRFO) from a requester in response to an execution of an instruction to modify a target cache line (CL) with a new value, the CondRFO identifying the target CL and the new value; determining from a local cache a local CL corresponding to the target CL; determining a local value from the local CL; comparing the local value with the new value; setting a coherency state of the local CL to (S)hared when the local value is same as the new value; setting the coherency state of the local CL to (I)nvalid when the local value is different than the new value; and sending a response and a copy of the local CL to the requester. Other embodiments include an apparatus configured to perform the actions of the methods.

Computer Memory Expansion Device and Method of Operation

A memory expansion device operable with a host computer system (host) comprises a non-volatile memory (NVM) subsystem, cache memory, and control logic configurable to receive a submission from the host including a read command and specifying a payload in the NVM subsystem and demand data in the payload. The control logic is configured to request ownership of a set of cache lines corresponding to the payload, to indicate completion of the submission after acquiring ownership of the cache lines, and to load the payload to the cache memory. The set of cache lines correspond to a set of cache lines in a coherent destination memory space accessible by the host. The control logic is further configured to, after indicating completion of the submission and in response to a request from the host to read demand data in the payload, return the demand data after determining that the demand data is in the cache memory.

CACHE MEMORY SYSTEM AND CACHE MEMORY CONTROL METHOD
20230214328 · 2023-07-06 · ·

According to one embodiment, a cache memory system includes a cache memory and a cache controller. The cache memory can store first data to be read or written by a processor. The cache controller is configured to execute a refresh. The refresh includes reading the first data stored in the cache memory and writing the read first data to the cache memory. When executing the refresh, the cache controller is configured to exchange the first data stored in a first area of the cache memory for second data stored in a second area of the cache memory.

Variable protection window extension for a target address of a store-conditional request

A processing unit includes a processor core and an associated cache memory. The cache memory establishes a reservation of a hardware thread of the processor core for a store target address and services a store-conditional request of the processor core by conditionally updating the shared memory with store data based on the whether the hardware thread has a reservation for the store target address. The cache memory receives a hint associated with the store-conditional request indicating an intent of the store-conditional request. The cache memory protects the store target address against access by any conflicting memory access request during a protection window extension following servicing of the store-conditional request. The cache memory establishes a first duration for the protection window extension based on the hint having a first value and establishes a different second duration for the protection window extension based on the hint having a different second value.

MEMORY INCLUSIVITY MANAGEMENT IN COMPUTING SYSTEMS

Techniques of memory inclusivity management are disclosed herein. One example technique includes receiving a request from a core of the CPU to write a block of data corresponding to a first cacheline to a swap buffer at a memory. In response to the request, the method can include retrieving metadata corresponding to the first cacheline that includes a bit encoding a status value indicating whether the memory block at the memory currently contains data of the first cacheline or data corresponding to a second cacheline. The first and second cachelines alternately sharing the swap buffer at the memory. When the decoded status value indicates that the memory block at the first memory currently contains the data corresponding to the first cacheline, an instruction is transmitted to the memory controller to directly write the block of data to the memory block at the first memory.

METHOD AND SYSTEM FOR TRACKING STATE OF CACHE LINES

The state of cache lines transferred into an out of caches of processing hardware is tracked by monitoring hardware. The method of tracking includes monitoring the processing hardware for cache coherence events on a coherence interconnect between the processing hardware and monitoring hardware, determining that the state of a cache line has changed, and updating a hierarchical data structure to indicate the change in the state of said cache line. The hierarchical data structure includes a first level data structure including first bits, and a second level data structure including second bits, each of the first bits associated with a group of second bits. The step of updating includes setting one of the first bits and one of the second bits in the group corresponding to the first bit that is being set, according to an address of said cache line.

VARIABLE PROTECTION WINDOW EXTENSION FOR A TARGET ADDRESS OF A STORE-CONDITIONAL REQUEST

A processing unit includes a processor core and an associated cache memory. The cache memory establishes a reservation of a hardware thread of the processor core for a store target address and services a store-conditional request of the processor core by conditionally updating the shared memory with store data based on the whether the hardware thread has a reservation for the store target address. The cache memory receives a hint associated with the store-conditional request indicating an intent of the store-conditional request. The cache memory protects the store target address against access by any conflicting memory access request during a protection window extension following servicing of the store-conditional request. The cache memory establishes a first duration for the protection window extension based on the hint having a first value and establishes a different second duration for the protection window extension based on the hint having a different second value.

OPERATING SYSTEM DEACTIVATION OF STORAGE BLOCK WRITE PROTECTION ABSENT QUIESCING OF PROCESSORS

Operating system deactivation of write protection for a storage block is provided absent quiescing of processors in a multi-processor computing environment. The process includes receiving an address translation protection exception interrupt resulting from an attempted write access by a processor to a storage block, and determining by the operating system whether write protection for the storage block is active. Based on write protection for the storage block not being active, the operating system issues an instruction to clear or modify translation lookaside buffer entries of the processor associated with the storage block, absent waiting for an action by another processor of multiple processors of the computing environment, to facilitate write access to the storage block proceeding at the processor.

Processor and method implementing a cacheline demote machine instruction

Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Arbitration scheme for coherent and non-coherent memory requests

A processor in a system is responsive to a coherent memory request buffer having a plurality of entries to store coherent memory requests from a client module and a non-coherent memory request buffer having a plurality of entries to store non-coherent memory requests from the client module. The client module buffers coherent and non-coherent memory requests and releases the memory requests based on one or more conditions of the processor or one of its caches. The memory requests are released to a central data fabric and into the system based on a first watermark associated with the coherent memory buffer and a second watermark associated with the non-coherent memory buffer.