G06F12/0811

BOUNDING BOX PREFETCHER
20230222064 · 2023-07-13 ·

In one embodiment, a prefetching method implemented in a microprocessor, the prefetching method comprising: issuing all prefetches remaining for a memory block as L3 prefetches based on a set of conditions; and issuing L2 prefetches for cache lines corresponding to the L3 prefetches upon reaching the end of the memory block.

PERFORMANCE EVALUATION OF AN APPLICATION BASED ON DETECTING DEGRADATION CAUSED BY OTHER COMPUTING PROCESSES

Performance degradation of an application that is caused by another computing process that shares infrastructure with the application is detected. The application and the other computing device may execute via different virtual machines hosted on the same computing device. To detect the performance degradation that is attributable to the other computing process, certain storage segments of a data storage (e.g., a cache) shared by the virtual machines is written with data. A pattern of read operations are then performed on the segments to determine whether an increase in read access time has occurred. Such a performance degradation is attributable to another computing process. After detecting the degradation, a metric that quantifies the detected degradation attributable to the other computing process is provided to an ML model, which determines the actual performance of the application absent the degradation attributable to the other computing process.

PERFORMANCE EVALUATION OF AN APPLICATION BASED ON DETECTING DEGRADATION CAUSED BY OTHER COMPUTING PROCESSES

Performance degradation of an application that is caused by another computing process that shares infrastructure with the application is detected. The application and the other computing device may execute via different virtual machines hosted on the same computing device. To detect the performance degradation that is attributable to the other computing process, certain storage segments of a data storage (e.g., a cache) shared by the virtual machines is written with data. A pattern of read operations are then performed on the segments to determine whether an increase in read access time has occurred. Such a performance degradation is attributable to another computing process. After detecting the degradation, a metric that quantifies the detected degradation attributable to the other computing process is provided to an ML model, which determines the actual performance of the application absent the degradation attributable to the other computing process.

Data processing system having masters that adapt to agents with differing retry behaviors

A data processing system includes a plurality of snoopers, a processing unit including master, and a system fabric communicatively coupling the master and the plurality of snoopers. The master sets a retry operating mode for an interconnect operation in one of alternative first and second operating modes. The first operating mode is associated with a first type of snooper, and the second operating mode is associated with a different second type of snooper. The master issues a memory access request of the interconnect operation on the system fabric of the data processing system. Based on receipt of a combined response representing a systemwide coherence response to the request, the master delays an interval having a duration dependent on the retry operating mode and thereafter reissues the memory access request on the system fabric.

Three tiered hierarchical memory systems

Systems, apparatuses, and methods related to three tiered hierarchical memory systems are described herein. A three tiered hierarchical memory system can leverage persistent memory to store data that is generally stored in a non-persistent memory, thereby increasing an amount of storage space allocated to a computing system at a lower cost than approaches that rely solely on non-persistent memory. An example apparatus may include a persistent memory, and one or more non-persistent memories configured to map an address associated with an input/output (I/O) device to an address in logic circuitry prior to the apparatus receiving a request from the I/O device to access data stored in the persistent memory, and map the address associated with the I/O device to an address in a non-persistent memory subsequent to the apparatus receiving the request and accessing the data.

Three tiered hierarchical memory systems

Systems, apparatuses, and methods related to three tiered hierarchical memory systems are described herein. A three tiered hierarchical memory system can leverage persistent memory to store data that is generally stored in a non-persistent memory, thereby increasing an amount of storage space allocated to a computing system at a lower cost than approaches that rely solely on non-persistent memory. An example apparatus may include a persistent memory, and one or more non-persistent memories configured to map an address associated with an input/output (I/O) device to an address in logic circuitry prior to the apparatus receiving a request from the I/O device to access data stored in the persistent memory, and map the address associated with the I/O device to an address in a non-persistent memory subsequent to the apparatus receiving the request and accessing the data.

Lock-free work-stealing thread scheduler

Systems and methods are provided for lock-free thread scheduling. Threads may be placed in a ring buffer shared by all computer processing units (CPUs), e.g., in a node. A thread assigned to a CPU may be placed in the CPU's local run queue. However, when a CPU's local run queue is cleared, that CPU checks the shared ring buffer to determine if any threads are waiting to run on that CPU, and if so, the CPU pulls a batch of threads related to that ready-to-run thread to execute. If not, an idle CPU randomly selects another CPU to steal threads from, and the idle CPU attempts to dequeue a thread batch associated with the CPU from the shared ring buffer. Polling may be handled through the use of a shared poller array to dynamically distribute polling across multiple CPUs.

Lock-free work-stealing thread scheduler

Systems and methods are provided for lock-free thread scheduling. Threads may be placed in a ring buffer shared by all computer processing units (CPUs), e.g., in a node. A thread assigned to a CPU may be placed in the CPU's local run queue. However, when a CPU's local run queue is cleared, that CPU checks the shared ring buffer to determine if any threads are waiting to run on that CPU, and if so, the CPU pulls a batch of threads related to that ready-to-run thread to execute. If not, an idle CPU randomly selects another CPU to steal threads from, and the idle CPU attempts to dequeue a thread batch associated with the CPU from the shared ring buffer. Polling may be handled through the use of a shared poller array to dynamically distribute polling across multiple CPUs.

ARITHMETIC PROCESSOR AND METHOD FOR OPERATING ARITHMETIC PROCESSOR
20230010353 · 2023-01-12 · ·

An arithmetic processor including a plurality of core groups each including a plurality of cores and a cache unit, a plurality of home agents each including a tag directory and a store command queue and a store command queue. The store command queue enters the received store request to the entry queue in order of reception, the cache unit stores the data of the store request in a data RAM. The store command queue sets a data ownership acquisition flag of the store request to valid when obtaining a data ownership of the store request and issues a top-of-queue notification to the cache control unit when the flag of the top-of-queue entry is valid. In response to the top-of-queue notification, the cache unit update a cache tag to modified state and issue a store request completion notification.

TECHNOLOGIES FOR CROSS-DEVICE SHARED WEB RESOURCE CACHE
20230214438 · 2023-07-06 ·

Technologies for cross-device shared web resource caching include a client device and a shared cache device. The client device scans for a shared cache device in local proximity to the client device and, in response to the scan, registers with the shared cache device. After registering, the client device requests a cached web resource from the shared cache device. The shared cache device determines whether a cached web resource that matches the request is installed in a shared cache. The shared cache device may determine whether an origin of the request matches the origin of the cached web resource. If installed, the shared cache device sends a found response and the cached web resource to the client device. If not installed, the shared cache device sends a not-found response and the client device may request the web resource from a remote web server. Other embodiments are described and claimed.