G06F12/0844

Concurrent cache lookups using partial identifiers
11294817 · 2022-04-05 · ·

To perform a lookup for a group of plural portions of data in a cache together, a first part of an identifier for a first one of the portions of data in the group is compared with corresponding first parts of the identifiers for cache lines in the cache, the first part of the identifier for the first one of the portions of data in the group is compared with the corresponding first parts of the identifiers for the remaining portions of data in the group of plural portions of data, and a remaining part of the identifier for each portion of data is compared with the corresponding remaining parts of identifiers for cache lines in the cache. It is then determined whether a cache line for any of the portions of data in the group is present in the cache, based on the results of the comparisons.

Concurrent cache lookups using partial identifiers
11294817 · 2022-04-05 · ·

To perform a lookup for a group of plural portions of data in a cache together, a first part of an identifier for a first one of the portions of data in the group is compared with corresponding first parts of the identifiers for cache lines in the cache, the first part of the identifier for the first one of the portions of data in the group is compared with the corresponding first parts of the identifiers for the remaining portions of data in the group of plural portions of data, and a remaining part of the identifier for each portion of data is compared with the corresponding remaining parts of identifiers for cache lines in the cache. It is then determined whether a cache line for any of the portions of data in the group is present in the cache, based on the results of the comparisons.

Concurrent Cache Lookups Using Partial Identifiers
20220075730 · 2022-03-10 · ·

To perform a lookup for a group of plural portions of data in a cache together, a first part of an identifier for a first one of the portions of data in the group is compared with corresponding first parts of the identifiers for cache lines in the cache, the first part of the identifier for the first one of the portions of data in the group is compared with the corresponding first parts of the identifiers for the remaining portions of data in the group of plural portions of data, and a remaining part of the identifier for each portion of data is compared with the corresponding remaining parts of identifiers for cache lines in the cache. It is then determined whether a cache line for any of the portions of data in the group is present in the cache, based on the results of the comparisons.

Cache snooping mode extending coherence protection for certain requests

A cache memory includes a data array, a directory of contents of the data array that specifies coherence state information, and snoop logic that processes operations snooped from a system fabric by reference to the data array and the directory. The snoop logic, responsive to snooping on the system fabric a request of a first flush/clean memory access operation that specifies a target address, determines whether or not the cache memory has coherence ownership of the target address. Based on determining the cache memory has coherence ownership of the target address, the snoop logic services the request and thereafter enters a referee mode. While in the referee mode, the snoop logic protects a memory block identified by the target address against conflicting memory access requests by the plurality of processor cores until conclusion of a second flush/clean memory access operation that specifies the target address.

Microprocessor architecture having alternative memory access paths

The present invention is directed to a system and method which employ two memory access paths: 1) a cache-access path in which block data is fetched from main memory for loading to a cache, and 2) a direct-access path in which individually-addressed data is fetched from main memory. The system may comprise one or more processor cores that utilize the cache-access path for accessing data. The system may further comprise at least one heterogeneous functional unit that is operable to utilize the direct-access path for accessing data. In certain embodiments, the one or more processor cores, cache, and the at least one heterogeneous functional unit may be included on a common semiconductor die (e.g., as part of an integrated circuit). Embodiments of the present invention enable improved system performance by selectively employing the cache-access path for certain instructions while selectively employing the direct-access path for other instructions.

Microprocessor architecture having alternative memory access paths

The present invention is directed to a system and method which employ two memory access paths: 1) a cache-access path in which block data is fetched from main memory for loading to a cache, and 2) a direct-access path in which individually-addressed data is fetched from main memory. The system may comprise one or more processor cores that utilize the cache-access path for accessing data. The system may further comprise at least one heterogeneous functional unit that is operable to utilize the direct-access path for accessing data. In certain embodiments, the one or more processor cores, cache, and the at least one heterogeneous functional unit may be included on a common semiconductor die (e.g., as part of an integrated circuit). Embodiments of the present invention enable improved system performance by selectively employing the cache-access path for certain instructions while selectively employing the direct-access path for other instructions.

System and method of a highly concurrent cache replacement algorithm

Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.

System and method of a highly concurrent cache replacement algorithm

Disclosed are a method and system for managing multi-threaded concurrent access to a cache data structure. The cache data structure includes a hash table and three queues. The hash table includes a list of elements for each hash bucket with each hash bucket containing a mutex object and elements in each of the queues containing lock objects. Multiple threads can each lock a different hash bucket to have access to the list, and multiple threads can each lock a different element in the queues. The locks permit highly concurrent access to the cache data structure without conflict. Also, atomic operations are used to obtain pointers to elements in the queues so that a thread can safely advance each pointer. Race conditions that are encountered with locking an element in the queues or entering an element into the hash table are detected, and the operation encountering the race condition is retried.

Apparatus, systems, and methods for providing computational imaging pipeline

The present application relates generally to a parallel processing device. The parallel processing device can include a plurality of processing elements, a memory subsystem, and an interconnect system. The memory subsystem can include a plurality of memory slices, at least one of which is associated with one of the plurality of processing elements and comprises a plurality of random access memory (RAM) tiles, each tile having individual read and write ports. The interconnect system is configured to couple the plurality of processing elements and the memory subsystem. The interconnect system includes a local interconnect and a global interconnect.

Cache efficient reading of result values in a column store database
11119742 · 2021-09-14 · ·

A system for cache efficient reading of column values in a database is provided. In some aspects, the system performs operations including pre-fetching, asynchronously and in response to a request for data in a column store database system, a plurality of first values associated with the requested data. The request may identify a row of the column store database system associated with the requested data. The plurality of first values may be located in the row. The operations may further include storing the plurality of first values in a cache memory. The operations may further include pre-fetching, asynchronously and based on the plurality of first values, a plurality of second values. The operations may further include storing the plurality of second values in the cache memory. The operations may further include reading, in response to the storing the plurality of second values, the requested data from the cache memory.