Patent classifications
G06F12/0837
PROGRESSIVE CACHING OF FILTER RULES
Techniques are disclosed relating to filtering messages. A computer system may detect an occurrence of an event of a particular type. The computer system may determine whether to enqueue, in a message queue, a message that identifies a set of tasks to be performed in relation to the event. The determination may be based on a response received from a cache that stores a subset of filter rules of a filter rules table. Based on the response indicating a cache miss, the computer system may enqueue the message in the message queue. A process that processes the message may be operable to resolve the cache miss by 1) accessing a filter rule from the filter rules table that indicates whether messages for events of the particular type should be enqueued in the message queue and 2) updating the cache to store the filter rule.
Thread group scheduling for graphics processing
Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
Thread group scheduling for graphics processing
Embodiments are generally directed to thread group scheduling for graphics processing. An embodiment of an apparatus includes a plurality of processors including a plurality of graphics processors to process data; a memory; and one or more caches for storage of data for the plurality of graphics processors, wherein the one or more processors are to schedule a plurality of groups of threads for processing by the plurality of graphics processors, the scheduling of the plurality of groups of threads including the plurality of processors to apply a bias for scheduling the plurality of groups of threads according to a cache locality for the one or more caches.
Graphics systems and methods for accelerating synchronization using fine grain dependency check and scheduling optimizations based on available shared memory space
Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.
Graphics systems and methods for accelerating synchronization using fine grain dependency check and scheduling optimizations based on available shared memory space
Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.
COMPOSABLE INFRASTRUCTURE ENABLED BY HETEROGENEOUS ARCHITECTURE, DELIVERED BY CXL BASED CACHED SWITCH SOC
Described herein are systems, methods, and products utilizing a cache coherent switch on chip. The cache coherent switch on chip may utilize Compute Express Link (CXL) interconnect open standard and allow for multi-host access and the sharing of resources. The cache coherent switch on chip provides for resource sharing between components while independent of a system processor, removing the system processor as a bottleneck. Cache coherent switch on chip may further allow for cache coherency between various different components. Thus, for example, memories, accelerators, and/or other components within the disclose systems may each maintain caches, and the systems and techniques described herein allow for cache coherency between the different components of the system with minimal latency.
SYSTEMS, METHODS, AND DEVICES FOR BIAS MODE MANAGEMENT IN MEMORY SYSTEMS
A method for managing a memory system may include monitoring one or more accesses of a page of memory, determining, based on the monitoring, an access pattern of the page of memory, and selecting, based on the access pattern, a coherency bias for the page of memory. The monitoring may include maintaining an indication of the one or more accesses. The determining may include comparing the indication to a threshold. Maintaining the indication may include changing the indication in a first manner based on an access of the page of memory by a first apparatus. Maintaining the indication may include changing the indication in a second manner based on an access of the page of memory by a second apparatus. The first manner may counteract the second manner.
Cache for Storing Coherent and Non-Coherent Data
The present disclosure advantageously provides a system cache and a method for storing coherent data and non-coherent data in a system cache. A transaction is received from a source in a system, the transaction including at least a memory address, the source having a location in a coherent domain or a non-coherent domain of the system, the coherent domain including shareable data and the non-coherent domain including non-shareable data. Whether the memory address is stored in a cache line is determined, and, when the memory address is not determined to be stored in a cache line, a cache line is allocated to the transaction including setting a state bit of the allocated cache line based on the source location to indicate whether shareable or non-shareable data is stored in the allocated cache line, and the transaction is processed.
Active bridge chiplet with integrated cache
A chiplet system includes a central processing unit (CPU) communicably coupled to a first GPU chiplet of a GPU chiplet array. The GPU chiplet array includes the first GPU chiplet communicably coupled to the CPU via a bus and a second GPU chiplet communicably coupled to the first GPU chiplet via an active bridge chiplet. The active bridge chiplet is an active silicon die that bridges GPU chiplets and allows partitioning of systems-on-a-chip (SoC) functionality into smaller functional chiplet groupings.
Hybrid hardware-software coherent framework
Examples herein describe an accelerator device that shares the same coherent domain as hardware elements in a host computing device. The embodiments herein describe a mix of hardware and software coherency which reduces the overhead of managing data when large chunks of data are moved from the host into the accelerator device. In one embodiment, an accelerator application executing on the host identifies a data set it wishes to transfer to the accelerator device to be processed. The accelerator application transfers ownership from a home agent in the host to the accelerator device. A slave agent can then take ownership of the data. As a result, any memory operation requests received from a requesting agent in the accelerator device can gain access to the data set in local memory via the slave agent without the slave agent obtaining permission from the home agent in the host.