G06F2212/271

APPARATUS, METHOD, AND SYSTEM FOR ENHANCED DATA PREFETCHING BASED ON NON-UNIFORM MEMORY ACCESS (NUMA) CHARACTERISTICS

Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.

Apparatus, method, and system for enhanced data prefetching based on non-uniform memory access (NUMA) characteristics

Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.

Increased bandwidth of ordered stores in a non-uniform memory subsystem

A method, computer program product, and system for maintaining a proper ordering of a data steam that includes two or more sequentially ordered stores, the data stream being moved to a destination memory device, the two or more sequentially ordered stores including at least a first store and a second store, wherein the first store is rejected by the destination memory device. A computer-implemented method includes sending the first store to the destination memory device. A conditional request is sent to the destination memory device for approval to send the second store to the destination memory device, the conditional request dependent upon successful completion of the first store. The second store is cancelled responsive to receiving a reject response corresponding to the first store.

APPARATUS, METHOD, AND SYSTEM FOR ENHANCED DATA PREFETCHING BASED ON NON-UNIFORM MEMORY ACCESS (NUMA) CHARACTERISTICS

Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.

CHASSIS SERVICING AND MIGRATION IN A SCALE-UP NUMA SYSTEM

One aspect of the application can provide a system and method for replacing a failing node with a spare node in a non-uniform memory access (NUMA) system. During operation, in response to determining that a node-migration condition is met, the system can initialize a node controller of the spare node such that accesses to a memory local to the spare node are to be processed by the node controller, quiesce the failing node and the spare node to allow state information of processors on the failing node to be migrated to processors on the spare node, and subsequent to unquiescing the failing node and the spare node, migrate data from the failing node to the spare node while maintaining cache coherence in the NUMA system and while the NUMA system remains in operation, thereby facilitating continuous execution of processes previously executed on the failing node.

SYSTEMS AND METHODS FOR EFFICIENT CACHELINE HANDLING BASED ON PREDICTIONS

A data management method for a processor to which a first cache, a second cache, and a behavior history table are allocated, includes tracking reuse information learning cache lines stored in at least one of the first cache and the second cache; recording the reuse information in the behavior history table; and determining a placement policy with respect to future operations that are to be performed on a plurality of cache lines stored in the first cache and the second cache, based on the reuse information in the behavior history table.

ADDRESS SPACE ACCESS CONTROL

There is provided an apparatus for receiving a request from a master to access an input address. Coarse grain access circuitry stores and provides a reference to an area of an output address space in dependence on the input address. One or more fine grain access circuits, each store and provide a reference to a sub-area in the area of the output address space in dependence on the input address. The apparatus forwards the request from the coarse grain access circuitry to one of the one fine grain access circuits in dependence on the input address.

Increased bandwidth of ordered stores in a non-uniform memory subsystem

A method, computer program product, and system for maintaining a proper ordering of a data steam that includes two or more sequentially ordered stores, the data stream being moved to a destination memory device, the two or more sequentially ordered stores including at least a first store and a second store, wherein the first store is rejected by the destination memory device. A computer-implemented method includes sending the first store to the destination memory device. A conditional request is sent to the destination memory device for approval to send the second store to the destination memory device, the conditional request dependent upon successful completion of the first store. The second store is cancelled responsive to receiving a reject response corresponding to the first store.

Cache memory system and processor system
10025719 · 2018-07-17 · ·

A cache memory system includes cache memories of at least one layer, at least one of the cache memories having a data cache to store data and a tag to store an address of data stored in the data cache, and a first address conversion information storage to store entry information that includes address conversion information for virtual addresses issued by a processor to physical addresses and cache presence information that indicates whether data corresponding to the converted physical address is stored in a specific cache memory of at least one layer among the cache memories.

Changing cache ownership in clustered multiprocessor
09940238 · 2018-04-10 · ·

A chip multiprocessor may include a first cluster and a second cluster, each having multiple cores of a processor, multiple co-located cache slices, and a memory controller. The processor stores directory information in a memory to indicate cluster cache ownership of a first address space to the first cluster. In response to a request to change the cluster cache ownership of the first address space to a second address space of the second cluster, the processor provides a quiesce period during which to block new read or write requests to the first cluster and the second cluster; drain read or write requests issued on the first cluster and the second cluster; and remove the block on new read or write requests. The processor may also update the directory information to change the cluster cache ownership of the first address space to the second address space of the second cluster.