G06F2212/6046

MULTI-CORE COMMUNICATION ACCELERATION USING HARDWARE QUEUE DEVICE

Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (LLC), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

COPY SOURCE TO TARGET MANAGEMENT IN A DATA STORAGE SYSTEM

Copy source to target operations may be selectively and preemptively undertaken in advance of source destage operations. In another aspect, logic detects sequential writes including large block writes to point-in-time copy sources. In response, destage tasks on the associated point-in-time copy targets are started which include in one embodiment, stride-aligned copy source to target operations which copy unmodified data from the point-in-time copy sources to the point-in-time copy targets in alignment with the strides of the target. As a result, when write data of write operations is destaged to the point-in-time copy sources, such source destages do not need to wait for copy source to target operations since they have already been performed. In addition, the copy source to target operations may be stride-aligned with respect to the stride boundaries of the point-in-time copy targets. Other features and aspects may be realized, depending upon the particular application.

Copy source to target management in a data storage system

Copy source to target operations may be selectively and preemptively undertaken in advance of source destage operations. In another aspect, logic detects sequential writes including large block writes to point-in-time copy sources. In response, destage tasks on the associated point-in-time copy targets are started which include in one embodiment, stride-aligned copy source to target operations which copy unmodified data from the point-in-time copy sources to the point-in-time copy targets in alignment with the strides of the target. As a result, when write data of write operations is destaged to the point-in-time copy sources, such source destages do not need to wait for copy source to target operations since they have already been performed. In addition, the copy source to target operations may be stride-aligned with respect to the stride boundaries of the point-in-time copy targets. Other features and aspects may be realized, depending upon the particular application.

Methods and apparatus for memory tier page cache coloring hints

Methods and apparatus for providing for a cache replacement policy for page caches for storage having a first memory tier having regions and virtual memory having mmaps of ones of the regions in the first memory tier. In an embodiment, the cache replacement policy includes setting a color hint to a first one of the cached pages, wherein the color hint includes a value indicating hotness of the first one of the cached pages.

System cache optimizations for deep learning compute engines
11914525 · 2024-02-27 · ·

In an example, an apparatus comprises a plurality of compute engines; and logic, at least partially including hardware logic, to detect a cache line conflict in a last-level cache (LLC) communicatively coupled to the plurality of compute engines; and implement context-based eviction policy to determine a cache way in the cache to evict in order to resolve the cache line conflict. Other embodiments are also disclosed and claimed.

System, Apparatus And Method For Selective Enabling Of Locality-Based Instruction Handling

In an embodiment, a processor includes a sparse access buffer having a plurality of entries each to store for a memory access instruction to a particular address, address information and count information; and a memory controller to issue read requests to a memory, the memory controller including a locality controller to receive a memory access instruction having a no-locality hint and override the no-locality hint based at least in part on the count information stored in an entry of the sparse access buffer. Other embodiments are described and claimed.

Data processing system for a home node to authorize a master to bypass the home node to directly send data to a slave

A data processing system comprises a master node to initiate data transmissions; one or more slave nodes to receive the data transmissions; and a home node to control coherency amongst data stored by the data processing system; in which at least one data transmission from the master node to one of the one or more slave nodes bypasses the home node.

Block caching in a shared storage environment

A storage system comprises a shared storage environment that includes a storage array having at least one storage volume shared between first and second host devices. The storage system further comprises a server associated with the storage array, at least first and second clients associated with the respective first and second host devices, and a first block cache arranged between the first client and the storage array. The server is configured to coordinate operations of the first and second clients relating to the storage volume shared between the first and second host devices in a manner that ensures coherency of data stored in the first block cache. The server may comprise a storage block mapping protocol (SBMP) server and the first and second clients may comprise respective SBMP clients. The block cache is illustratively implemented using a VFCache or other type of server flash cache.

Method, system, and apparatus for reducing processor latency

Disclosed is a method, apparatus, and/or computer program product for reducing latency in a processor with regard to the execution of noncacheable operations that includes receiving noncacheable operations from one or both of the level 2 cache and a level 3 cache, sending the noncacheable operations to a noncacheable unit (NCU) associated with a core of the processor, executing the noncacheable operations by the NCU, and sending results of the executed noncacheable operations to a host bridge for output to an input/out device. The noncacheable operations bypass the core of the processor.

Systems and methods for distributing cache space

The disclosed computer-implemented method for distributing cache space may include (i) identifying workloads that make input/output requests to a storage system that comprises a cache that stores a copy of data recently written to the storage system, (ii) calculating a proportion of the cache that is occupied by data written to the cache by a workload, (iii) determining that the proportion of the cache that is occupied by the data written to the cache by the workload is disproportionate, and (iv) limiting the volume of input/output requests from workload that will be accepted by the storage system in response to determining that the proportion of the cache that is occupied by the data written to the cache by the workload is disproportionate. Various other methods, systems, and computer-readable media are also disclosed.