G06F2212/6046

Memory Allocation in a Hierarchical Memory System

A memory allocator in a computer system comprising a plurality of CPU cores (5101-5104) and a first (530) and a second (5120) memory unit having different data access times and wherein each one of the first and the second memory units is divided into memory portions wherein each memory portion (SLICE 0-3) in the second memory unit is associated with at least one memory portion (A-G) in the first memory unit, and wherein each memory portion in the second memory unit is associated with a CPU core. If at least a predetermined number of memory portions in the first memory unit being part of the available requested memory is associated with the memory portion in the second memory unit that is associated with the CPU core on which the requesting application is running, the requested available memory is allocated to the requesting application.

System, apparatus and method for selective enabling of locality-based instruction handling

In an embodiment, a processor includes a sparse access buffer having a plurality of entries each to store for a memory access instruction to a particular address, address information and count information; and a memory controller to issue read requests to a memory, the memory controller including a locality controller to receive a memory access instruction having a no-locality hint and override the no-locality hint based at least in part on the count information stored in an entry of the sparse access buffer. Other embodiments are described and claimed.

System cache optimizations for deep learning compute engines
11003592 · 2021-05-11 · ·

In an example, an apparatus comprises a plurality of compute engines; and logic, at least partially including hardware logic, to detect a cache line conflict in a last-level cache (LLC) communicatively coupled to the plurality of compute engines; and implement context-based eviction policy to determine a cache way in the cache to evict in order to resolve the cache line conflict. Other embodiments are also disclosed and claimed.

Copy source to target management in a data storage system

Copy source to target operations may be selectively and preemptively undertaken in advance of source destage operations. In another aspect, logic detects sequential writes including large block writes to point-in-time copy sources. In response, destage tasks on the associated point-in-time copy targets are started which include in one embodiment, stride-aligned copy source to target operations which copy unmodified data from the point-in-time copy sources to the point-in-time copy targets in alignment with the strides of the target. As a result, when write data of write operations is destaged to the point-in-time copy sources, such source destages do not need to wait for copy source to target operations since they have already been performed. In addition, the copy source to target operations may be stride-aligned with respect to the stride boundaries of the point-in-time copy targets. Other features and aspects may be realized, depending upon the particular application.

METHOD AND SYSTEM FOR PERFORMING DATA MOVEMENT OPERATIONS WITH READ SNAPSHOT AND IN PLACE WRITE UPDATE

Method and system for performing data movement operations is described herein. One embodiment of a method includes: storing data for a first memory address in a cache line of a memory of a first processing unit, the cache line associated with a coherency state indicating that the memory has sole ownership of the cache line; decoding an instruction for execution by a second processing unit, the instruction comprising a source data operand specifying the first memory address and a destination operand specifying a memory location in the second processing unit; and responsive to executing the decoded instruction, copying data from the cache line of the memory of the first processing unit as identified by the first memory address, to the memory location of the second processing unit, wherein responsive to the copy, the cache line is to remain in the memory and the coherency state is to remain unchanged.

Multi-core communication acceleration using hardware queue device

Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (LLC), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

OLDEST OPERATION WAIT TIME INDICATION INPUT INTO SET-DUELING
20210073126 · 2021-03-11 ·

Systems, apparatuses, and methods for dynamically adjusting cache policies to reduce execution core wait time are disclosed. A processor includes a cache subsystem. The cache subsystem includes one or more cache levels and one or more cache controllers. A cache controller partitions a cache level into two test portions and a remainder portion. The cache controller applies a first policy to the first test portion and applies a second policy to the second test portion. The cache controller determines the amount of time the execution core spends waiting on accesses to the first and second test portions. If the measured wait time is less for the first test portion than for the second test portion, then the cache controller applies the first policy to the remainder portion. Otherwise, the cache controller applies the second policy to the remainder portion.

CACHE MANAGEMENT METHOD USING OBJECT-ORIENTED MANNER AND ASSOCIATED MICROCONTROLLER
20210056032 · 2021-02-25 ·

The present invention provides a microcontroller, wherein the microcontroller includes a processor, a first memory and a cache controller. The first memory includes at least a working space. The cache controller is coupled to the first memory, and is arranged for managing the working space of the first memory, and dynamically loading at least one object from a second memory to the working space of the first memory in an object-oriented manner.

Memory allocation in multi-core processors

A system for allocating memory (e.g., heap) in multi-core processors is provided. In some implementations, the system performs operations comprising receiving, at a shared cache having a plurality of segments, a first data allocation including a plurality of data blocks, and allocating at least a first and second data block from the first allocation. First and second segments in the shared cache can each comprise a plurality of data slots (e.g., of equal length). Allocating the first and second data blocks can include storing the first data block in a data slot of the first segment and storing the second data block in a data slot of the second segment. The plurality of data slots which do not contain data may contain padding, and/or the data slots to which the first and second data blocks are allocated are not adjacent. Related systems, methods, and articles of manufacture are also described.

Systems and methods for tag-less buffer implementation

A data management method for a computer system including at least one processor and at least a first cache, a second cache, a victim buffer (VB), and a memory allocated to the at least one processor, includes selecting a victim cache line to be evicted from the first cache; finding a VB location corresponding to the victim cache line from a set of the VB; copying data of the victim cache line to a data field of the VB location; copying a backward pointer (BP) associated with the victim cache line to a BP field of the VB location; and reclaiming victim space of the first cache using the VB.