G06F2212/1021

ANALYSIS FOR MODELING DATA CACHE UTILIZATION

Aspects include modeling data cache utilization for each loop in a loop nest; estimating total data cache lines fetched in one iteration of the loop; and determining the possibility of data cache reuse across loop iterations using data cache lines fetched and associativity constraints. Aspects also include estimating, for memory reference pairs, reuse by one reference of data cache line fetched by another; estimating total number of cache misses for all iterations of the loop; and estimating total number of cache misses of a reference for iterations of a next outer loop as equal to total cache misses for an entire inner loop. Aspects further include estimating memory cost of a loop unroll and jam transformation, without performing the transformation; and extending a data cache model to estimate best unroll-and-jam factors for the loop nest, capable of minimizing total cache misses incurred by the memory references in the loop body.

Allocation of a Buffer Located in System Memory into a Cache Memory

A non-transitory computer-readable medium is disclosed, the medium having instructions stored thereon that are executable by a computer system to perform operations that may include allocating a plurality of storage locations in a system memory of the computer system to a buffer. The operations may further include selecting a particular order for allocating the plurality of storage locations into a cache memory circuit. This particular order may increase a uniformity of cache miss rates in comparison to a linear order. The operations may also include caching subsets of the plurality of storage locations of the buffer using the particular order.

Analysis for modeling data cache utilization

Aspects include modeling data cache utilization for each loop in a loop nest; estimating total data cache lines fetched in one iteration of the loop; and determining the possibility of data cache reuse across loop iterations using data cache lines fetched and associativity constraints. Aspects also include estimating, for memory reference pairs, reuse by one reference of data cache line fetched by another; estimating total number of cache misses for all iterations of the loop; and estimating total number of cache misses of a reference for iterations of a next outer loop as equal to total cache misses for an entire inner loop. Aspects further include estimating memory cost of a loop unroll and jam transformation, without performing the transformation; and extending a data cache model to estimate best unroll-and-jam factors for the loop nest, capable of minimizing total cache misses incurred by the memory references in the loop body.

Non-stalling, non-blocking translation lookaside buffer invalidation
11663141 · 2023-05-30 · ·

A method includes receiving, by a MMU for a processor core, an address translation request from the processor core and providing the address translation request to a TLB of the MMU; generating, by matching logic of the TLB, an address transaction that indicates whether a virtual address specified by the address translation request hits the TLB; providing the address transaction to a general purpose transaction buffer; and receiving, by the MMU, an address invalidation request from the processor core and providing the address invalidation request to the TLB. The method also includes, responsive to a virtual address specified by the address invalidation request hitting the TLB, generating, by the matching logic, an invalidation match transaction and providing the invalidation match transaction to one of the general purpose transaction buffer or a dedicated invalidation buffer.

CACHE ACCESS METHOD AND ASSOCIATED GRAPH NEURAL NETWORK SYSTEM
20230161707 · 2023-05-25 ·

The present application discloses a cache access method and an associated graph neural network system. The graph neural network processor is used for performing computation upon a graph neural network. The graph neural network is stored in the memory in compressed sparse row format. The method includes: receiving an address corresponding to a node of the graph neural network and a type of the address; in response to the type is one of a first type or a second type, performing lookup by comparing the address with a tag field of a degree lookup table to at least obtain a degree of the node; determining whether the degree is greater than a predetermined value to obtain a determination result; and determining whether to perform lookup on a region of the cache corresponding to the type according to the determination result.

COMPUTER-READABLE RECORDING MEDIUM STORING DIVISION PROGRAM AND DIVISION METHOD
20230161706 · 2023-05-25 · ·

A recording medium stores a division program for causing a computer to execute processing including: acquiring, for each application of a plurality of applications capable of sharing a cache memory, memory access information that enables specification of memory addresses accessed when each of the applications is operated in time series; calculating, for each of the applications, a frequency distribution of access intervals to the same memory address based on the acquired memory access information; specifying, for each of the applications, a correspondence relationship between a cache capacity and the number of cache hits to be allocated to each of the applications based on the calculated frequency distribution of the access intervals; and distributing, on based on the correspondence relationship specified for each of the applications, the cache memory to the plurality of applications such that a total number of cache hits of each of the applications is maximized.

Criticality-Informed Caching Policies

A cache may store critical cache lines and non-critical cache lines, and may attempt to retain critical cache lines in the cache by, for example, favoring the critical cache lines in replacement data updates, retaining the critical cache lines with a certain probability when victim cache blocks are being selected, etc. Criticality values may be retained at various levels of the cache hierarchy. Additionally, accelerated eviction may be employed if the threads previously accessing the critical cache blocks are viewed as dead.

PROCESSOR AND ARITHMETIC PROCESSING METHOD

A processor includes request issuing units issuing an access request to a storage, a data array including banks holding sub data divided from data read from the storage based on the access request, a switch to transfer the access request to one of the banks, and first and second determination units. The first determination unit determines a cache hit when a tag address included in the access address matches a tag address held therein in correspondence with an index address included in the access address. The second determination unit determines a cache hit when identification information corresponding to a first tag address included in the access address and a second tag address included in the access address, match identification information and second tag address held therein. A cache controller makes access to the data array or storage, based on a determination result of the first or second determination unit.

METHODS AND SYSTEMS FOR MAINTAINING CACHE COHERENCY BETWEEN NODES IN A CLUSTERED ENVIRONMENT BY PERFORMING A BITMAP LOOKUP IN RESPONSE TO A READ REQUEST FROM ONE OF THE NODES

Disclosed herein are methods, systems, and processes to provide coherency across disjoint caches in clustered environments. It is determined whether a data object is owned by an owner node, where the owner node is one of multiple nodes of a cluster. If the owner node for the data object is identified by the determining, a request is sent to the owner node for the data object. However, if the owner node for the data object is not identified by the determining, selects a node in the cluster is selected as the owner node, and the request for the data object is sent to the owner node.

CACHE MISS PREDICTOR
20230108964 · 2023-04-06 · ·

Methods, devices, and systems for retrieving information based on cache miss prediction. A prediction that a cache lookup for the information will miss a cache is made based on a history table. The cache lookup for the information is performed based on the request. A main memory fetch for the information is begun before the cache lookup completes, based on the prediction that the cache lookup for the information will miss the cache. In some implementations, the prediction includes comparing a first set of bits stored in the history table with a second set of bits stored in the history table. In some implementations, the prediction includes comparing at least a portion of an address of the request for the information with a set of bits in the history table.