Patent classifications
G06F2212/6028
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
THROTTLING SCHEMES IN MULTICORE MICROPROCESSORS
An electronic device includes a cache, a processing cluster having one or more processors, and prefetch throttling circuitry that determines a congestion level of the processing cluster based on an extent to which the data retrieval requests sent from the processors to the cache are not satisfied by the cache. Congestion criteria require that the congestion level of the cluster is above a cluster congestion threshold. In accordance with a determination that the congestion level of the cluster satisfies the congestion criteria, the prefetch throttling circuit causes one of the processors to limit prefetch requests to the cache to prefetch requests of at least a threshold quality. In accordance with a determination that the congestion level of the cluster does not satisfy the congestion criteria, the prefetch throttling circuit forgoes causing the processors to limit prefetch requests to the cache to prefetch requests of at least the threshold quality.
Data prefetching method and apparatus
This application discloses a data prefetching method, including: receiving, by a home node, a write request sent by a first cache node after the first cache node processes received data; performing, by the home node, an action of determining whether the second cache node needs to perform a data prefetching operation on the to-be-written data; and when determining that the second cache node needs to perform a data prefetching operation on the to-be-written data, sending, by the home node, the to-be-written data to the second cache node. Embodiments of this application help improve accuracy and certainty of a data prefetching time point, and reduce a data prefetching delay.
TECHNIQUES FOR DYNAMIC SEQUENTIAL INSTRUCTION PREFETCHING
A technique for operating a processor includes allocating an entry in a prefetch filter queue (PFQ) for a cache line address (CLA) in response to the CLA missing in an upper level instruction cache. In response to the CLA subsequently hitting in the upper level instruction cache, an associated prefetch value for the entry in the PFQ is updated. In response to the entry being aged-out of the PFQ, an entry in a backing array for the CLA and the associated prefetch value is allocated. In response to subsequently determining that prefetching is required for the CLA, the backing array is accessed to determine the associated prefetch value for the CLA. A cache line at the CLA and a number of sequential cache lines specified by the associated prefetch value in the backing array are then prefetched into the upper level instruction cache.
COMPRESSION AWARE PREFETCH
Methods, devices, and systems for prefetching data. First data is loaded from a first memory location. The first data in cached in a cache memory. Other data is prefetched to the cache memory based on a compression of the first data and a compression of the other data. In some implementations, the compression of the first data and the compression of the other data are determined based on metadata associated with the first data and metadata associated with the other data. In some implementations, the other data is prefetched to the cache memory based on a total of a compressed size of the first data and a compressed size of the other data being less than a threshold size. In some implementations, the other data is not prefetched to the cache memory based on the other data being uncompressed.
MANAGING PREFETCHING OPERATIONS FOR DRIVES IN DISTRIBUTED STORAGE SYSTEMS
Systems and methods are provided for managing prefetching operations for read requests for drives in a distributed storage system. For example, a system can determine that a first drive of a plurality of drives is powered on. Prior to receiving a read request for reading a first set of data from the first drive, the system can enable a prefetching operation for prefetching the first set of data from the first drive to be written to a cache. The system may power off the first drive. The system may receive a read request for reading the first set of data from the first drive of a plurality of drives. In response to receiving the read request, the system may read the first set of data from the cache.
Cache aware searching based on one or more files in one or more buckets in remote storage
Embodiments are disclosed for performing cache aware searching. In response to a search query, a first bucket and a second bucket in remote storage for processing the search query. A determination is made that a first file in the first bucket is present in a cache when the search query is received. In response to the search query, a search is performed using the first file based on the determination that the first file is present in the cache when the search query is received, and the search is performed using a second file from the second bucket once the second file is stored in the cache.
Modifying machine learning models to improve locality
Methods, systems, and apparatus for updating machine learning models to improve locality are described. In one aspect, a method includes receiving data of a machine learning model. The data represents operations of the machine learning model and data dependencies between the operations. Data specifying characteristics of a memory hierarchy for a machine learning processor on which the machine learning model is going to be deployed is received. The memory hierarchy includes multiple memories at multiple memory levels for storing machine learning data used by the machine learning processor when performing machine learning computations using the machine learning model. An updated machine learning model is generated by modifying the operations and control dependencies of the machine learning model to account for the characteristics of the memory hierarchy. Machine learning computations are performed using the updated machine learning model.
PREDICTIVE MEMORY CACHING
Metadata history is collected for operations performed by an application as directed by a user. In a subsequent interaction by the user with the application, interaction metadata for the interaction is matched to a pattern in the metadata history. An operation identified in the pattern is processed as a background process and results from processing the operation are pre-staged in cache of the device being operated by the user. When the user requests the operation during the subsequent interaction with the application, the pre-staged results from the cache are provided to the user.
PREFETCH MANAGEMENT IN A HIERARCHICAL CACHE SYSTEM
An apparatus includes a CPU core, a first memory cache with a first line size, and a second memory cache having a second line size larger than the first line size. Each line of the second memory cache includes an upper half and a lower half. A memory controller subsystem is coupled to the CPU core and to the first and second memory caches. Upon a miss in the first memory cache for a first target address, the memory controller subsystem determines that the first target address resulting in the miss maps to the lower half of a line in the second memory cache, retrieves the entire line from the second memory cache, and returns the entire line from the second memory cache to the first memory cache.