G06F2201/885

Memory request throttling to constrain memory bandwidth utilization

A processing system includes an interconnect fabric coupleable to a local memory and at least one compute cluster coupled to the interconnect fabric. The compute cluster includes a processor core and a cache hierarchy. The cache hierarchy has a plurality of caches and a throttle controller configured to throttle a rate of memory requests issuable by the processor core based on at least one of an access latency metric and a prefetch accuracy metric. The access latency metric represents an average access latency for memory requests for the processor core and the prefetch accuracy metric represents an accuracy of a prefetcher of a cache of the cache hierarchy.

RECORDING A CACHE COHERENCY PROTOCOL TRACE FOR USE WITH A SEPARATE MEMORY VALUE TRACE
20230176971 · 2023-06-08 ·

A processor that performs cache-based tracing based on recording one or more cache coherency protocol (CCP) messages into a first trace. Based on detecting a memory access to a target memory address, the processor logs into the first trace information usable to obtain a memory value corresponding to the particular memory address from the memory snapshot(s) stored within the second trace. This includes logging the particular memory address, as well as CCP message(s) indicating at least of: (i) that none of a plurality of processing units possessed a first cache line within the cache that overlaps with the target memory address; (ii) that a first processing unit initiated a cache miss for the target memory address; or (iii) that the first processing unit obtained, from a second processing, a second cache line within the cache that overlaps with the target memory address.

APPARATUSES, METHODS, AND SYSTEMS TO PRECISELY MONITOR MEMORY STORE ACCESSES
20230176870 · 2023-06-08 ·

Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.

THROTTLING SCHEMES IN MULTICORE MICROPROCESSORS
20230176977 · 2023-06-08 ·

An electronic device includes a cache, a processing cluster having one or more processors, and prefetch throttling circuitry that determines a congestion level of the processing cluster based on an extent to which the data retrieval requests sent from the processors to the cache are not satisfied by the cache. Congestion criteria require that the congestion level of the cluster is above a cluster congestion threshold. In accordance with a determination that the congestion level of the cluster satisfies the congestion criteria, the prefetch throttling circuit causes one of the processors to limit prefetch requests to the cache to prefetch requests of at least a threshold quality. In accordance with a determination that the congestion level of the cluster does not satisfy the congestion criteria, the prefetch throttling circuit forgoes causing the processors to limit prefetch requests to the cache to prefetch requests of at least the threshold quality.

OMITTING PROCESSOR-BASED LOGGING OF SEPARATELY OBTAINABLE MEMORY VALUES DURING PROGRAM TRACING
20230169010 · 2023-06-01 ·

Reducing overheads of recording a replayable execution trace of a program's execution at a computer processor by omitting logging of accesses to memory addresses whose values can be reconstructed or predicted. A computer system determines that memory values corresponding to a range of memory addresses within a memory space for a process can be obtained separately from the process' execution, and configures a data structure for instructing a processor to omit logging of memory accesses when the processor accesses an address within this range while executing the process. Correspondingly, upon detecting a memory access while executing the process, the processor determines if it has been instructed to omit logging of the access by checking the data structure. When the data structure instructs the processor to omit logging of the access, the processor omits logging the memory access while it uses a cache to process the memory access.

Resource Coordination Method, Apparatus, and System for Database Cluster
20170308567 · 2017-10-26 ·

A resource coordination method, an apparatus, and a system for a database cluster, which include an active coordinator node obtains status information corresponding to each processing node in multiple processing nodes, where the status information is used to indicate an operating load status of the processing node, determines, according to the status information corresponding to each processing node in multiple processing nodes, whether the active coordinator node has an idle resource whose capacity is a preset threshold X, and if the active coordinator node has the idle resource whose capacity is the preset threshold X, instructs each processing node to upload subsequently generated clean page data to the active coordinator node.

Method and apparatus for correcting cache profiling information in multi-pass simulator

Provided method includes storing a first cache snap shot including cache profiling information regarding a cache when a first process being executed by a cycle accurate simulator is terminated; storing a second cache snap shot including the cache profiling information on the cache when a second process is executed in the cycle accurate simulator; comparing the second cache snap shot of the second process and the first cache snap shot of the first process to readjust any one value of a cache hit value and a cache miss value which are present in the second cache snap shot of the second process; and correcting the cache profiling information which is stored in the first cache snap shot of the first process by reflecting the readjusted any one value of the cache hit value and the cache miss value present in the second cache snap shot of the second process.

Method to efficiently track I/O access history using efficient memory data structures

An embodiment is described in which a memory device stores a record of I/O accesses to data blocks. And each access record indicates which data block was accessed and during which time period the access occurred. A memory-efficient data structure (MEDS) may be generated and stored in a cache or storage device and the access data moved from the memory device into the MEDS. The MEDS represents blocks that were accessed during a particular time period. When a second data block is accessed, a query function is applied to the second block's identifier to return a value based on data stored in the MEDS. The return value from the query function indicates whether the second data block was accessed during the particular time period associated with the MEDS. A storage management action is performed based on whether the second data block was accessed during the particular time period.

Distributed caching systems and methods

Example distributed caching systems and methods are described. In one implementation, a system has multiple host systems, each of which includes a cache resource that is accessed by one or more consumers. A management server is coupled to the multiple host systems and presents available cache resources and resources associated with available host systems to a user. The management server receives a user selection of at least one available cache resource and at least one host system. The selected host system is then configured to share the selected cache resource.

Predictive data orchestration in multi-tier memory systems

A computing system having memory components of different tiers. The computing system further includes a controller, operatively coupled between a processing device and the memory components, to: receive from the processing device first data access requests that cause first data movements across the tiers in the memory components; service the first data access requests after the first data movements; predict, by applying data usage information received from the processing device in a prediction model trained via machine learning, second data movements across the tiers in the memory components; and perform the second data movements before receiving second data access requests, where the second data movements reduce third data movements across the tiers caused by the second data access requests.