Patent classifications
G06F12/0848
APPARATUS AND METHOD WITH CACHE CONTROL
A computing apparatus is provided. The computing apparatus is configured to receive control information from a host device to control a cache area, generate a cache configuration based on the received control information, determine a first cache area and a second cache area in a memory in the computing apparatus based on the generated cache configuration, cache one or more instructions to the first cache area and cache data to the second cache area, and process a thread based on the one or more cached instructions and the cached data.
Limiting allocation of ways in a cache based on cache maximum associativity value
An apparatus has processing circuitry to perform data processing, at least one architectural register to store at least one partition identifier selection value which is programmable by software processed by the processing circuitry; a set-associative cache comprising a plurality of sets each comprising a plurality of ways; and partition identifier selecting circuitry to select, based on the at least one partition identifier selection value stored in the at least one architectural register, a selected partition identifier to be specified by a cache access request for accessing the set-associative cache. The set-associative cache comprises: selecting circuitry responsive to the cache access request to select, based on the selected partition identifier, a selected cache maximum associativity value; and allocation control circuitry to limit a number of ways allocated in a same set for information associated with the selected partition identifier to a maximum number of ways determined based on the selected cache maximum associativity value.
Way partitioning for a system-level cache
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for a system-level cache to allocate cache resources by a way-partitioning process. One of the methods includes maintaining a mapping between partitions and priority levels and allocating primary ways to respective enabled partitions in an order corresponding to the respective priority levels assigned to the enabled partitions.
METHODS AND SYSTEMS FOR MEMORY BANDWIDTH CONTROL
Resources of an electronic device are partitioned into a plurality of resource portions to be utilized by a plurality of clients. Each resource portion is assigned to a respective client, has a respective partition identifier (ID), and corresponds to a plurality of memory bandwidth usage states tracked for a plurality of memory blocks. For each resource portion, each of the memory bandwidth usage states is associated with a respective memory block and indicates at least how much of a memory access bandwidth assigned to the respective partition ID to access the respective memory block is used. A usage level is determined for each resource partition based on the memory bandwidth usage states, and applied to adjust a credit count. When the credit count is adjusted beyond a request issue threshold, a next data access request is issued from a memory access request queue for the respective partition ID.
SCALED SET DUELING FOR CACHE REPLACEMENT POLICIES
A processing system includes a cache that includes a cache lines that are partitioned into a first subset of the cache lines and a second subsets of the cache lines. The processing system also includes one or more counters that are associated with the second subsets of the cache lines. The processing system further includes a processor configured to modify the one or more counters in response to a cache hit or a cache miss associated with the second subsets. The one or more counters are modified by an amount determined by one or more characteristics of a memory access request that generated the cache hit or the cache miss.
VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES
A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.
Selective allocation of CPU cache slices to database objects
A central processing unit (CPU) forming part of a computing device, initiates execution of code associated with each of a plurality of objects used by a worker thread. The CPU has an associated cache that is split into a plurality of slices. It is determined, by a cache slice allocation algorithm for each object, whether any of the slices will be exclusive to or shared by the object. Thereafter, for each object, any slices determined to be exclusive to the object are activated such that the object exclusively uses such slices and any slices determined to be shared by the object are activated such that the object shares or is configured to share such slices.
MEMORY PARTITIONS FOR PROCESSING ENTITIES
In some examples, a system partitions a shared memory address space of a shared memory among a plurality of processing entities into a plurality of memory partitions, where a respective memory partition is associated with a respective processing entity. A first processing entity forwards, to a second processing entity, a first data operation, based on a determination by the first processing entity that the first data operation is to be applied to data for a memory partition associated with the second processing entity. The second processing entity applies the first data operation that includes writing data of the first data operation to the memory partition associated with the second processing entity using a non-atomic operation.
ACCUMULATORS CORRESPONDING TO BINS IN MEMORY
In some examples, a system includes a processing entity and a memory to store data arranged in a plurality of bins associated with respective key values of a key. The system includes a cache to store cached data elements for respective accumulators that are updatable to represent occurrences of the respective key values of the key, where each accumulator corresponds to a different bin of the plurality of bins, and each cached data element has a range that is less than a range of a corresponding bin of the plurality of bins. Responsive to a value of a given cached data element as updated by a given accumulator satisfying a criterion, the processing entity is to cause an aggregation of the value of the given cached data element with a bin value in a respective bin.