G06F2212/608

Data tiering in heterogeneous memory system

A heterogeneous memory system includes a memory device including first and second memories and a controller including a cache. The controller identifies memory access addresses among addresses for memory regions of the memory device; track, for a set period, a number of memory accesses for each memory access address; classify each memory access address into a frequently accessed address or a normal accessed address based on the number of memory accesses in the set period; and allocate the first memory for frequently accessed data associated with the frequently accessed address and the second memory for normal data associated with the normal accessed address.

PRIORITY-BASED STORAGE AND ACCESS OF COMPRESSED MEMORY LINES IN MEMORY IN A PROCESSOR-BASED SYSTEM

In an aspect, high priority lines are stored starting at an address aligned to a cache line size for instance 64 bytes, and low priority lines are stored in memory space left by the compression of high priority lines. The space left by the high priority lines and hence the low priority lines themselves are managed through pointers also stored in memory. In this manner, low priority lines contents can be moved to different memory locations as needed. The efficiency of higher priority compressed memory accesses is improved by removing the need for indirection otherwise required to find and access compressed memory lines, this is especially advantageous for immutable compressed contents. The use of pointers for low priority is advantageous due to the full flexibility of placement, especially for mutable compressed contents that may need movement within memory for instance as it changes in size over time

CONTENDED LOCK REQUEST ELISION SCHEME

A system and method for network traffic management between multiple nodes are described. A computing system includes multiple nodes connected to one another. When a home node determines a number of nodes requesting read access for a given data block assigned to the home node exceeds a threshold and a copy of the given data block is already stored at a first node of the multiple nodes in the system, the home node sends a command to the first node. The command directs the first node to forward a copy of the given data block to the home node. The home node then maintains a copy of the given data block and forwards copies of the given data block to other requesting nodes until the home node detects a write request or a lock release request for the given data block.

Multi-Level System Memory With Near Memory Scrubbing Based On Predicted Far Memory Idle Time
20170371795 · 2017-12-28 ·

An apparatus is described that includes a memory controller to interface to a multi-level system memory. The memory controller includes least recently used (LRU) circuitry to keep track of least recently used cache lines kept in a higher level of the multi-level system memory. The memory controller also includes idle time predictor circuitry to predict idle times of a lower level of the multi-level system memory. The memory controller is to write one or more lesser used cache lines from the higher level of the multi-level system memory to the lower level of the multi-level system memory in response to the idle time predictor circuitry indicating that an observed idle time of the lower level of the multi-level system memory is expected to be long enough to accommodate the write of the one or more lesser used cache lines from the higher level of the multi-level system memory to the lower level of the multi-level system memory.

Memory circuit and cache circuit configuration

A memory circuit includes a stack of first dies including multiple sets of memory cells of a first type, a second die including multiple sets of memory cells of a second type, a third die, and an interposer carrying the first, second, and third dies. The second die includes a first set of input/output (I/O) terminals on a top surface of the second die and a second set of I/O terminals on a bottom surface of the second die. The stack of first dies is coupled to the second die through the first set of I/O terminals. The interposer is coupled to the second die through the second set of I/O terminals. The third die is positioned aside the second die and in communication with the second die through the interposer.

GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.

Selective allocation of CPU cache slices to database objects
09842052 · 2017-12-12 · ·

A central processing unit (CPU) forming part of a computing device, initiates execution of code associated with each of a plurality of objects used by a worker thread. The CPU has an associated cache that is split into a plurality of slices. It is determined, by a cache slice allocation algorithm for each object, whether any of the slices will be exclusive to or shared by the object. Thereafter, for each object, any slices determined to be exclusive to the object are activated such that the object exclusively uses such slices and any slices determined to be shared by the object are activated such that the object shares or is configured to share such slices.

METHODS AND SYSTEMS FOR DISTRIBUTING MEMORY REQUESTS

A memory request, including an address, is accessed. The memory request also specifies a type of an operation (e.g., a read or write) associated with an instance (e.g., a block) of data. A group of caches is selected using a bit or bits in the address. A first hash of the address is performed to select a cache in the group. A second hash of the address is performed to select a set of cache lines in the cache. Unless the operation results in a cache miss, the memory request is processed at the selected cache. When there is a cache miss, a third hash of the address is performed to select a memory controller, and a fourth hash of the address is performed to select a bank group and a bank in memory.

Cache hashing
09836395 · 2017-12-05 · ·

Cache logic generates a cache address from an input memory address that includes a first binary string and a second binary string. The cache logic includes a hashing engine configured to generate a third binary string from the first binary string and to form each bit of the third binary string by combining a respective subset of bits of the first binary string by a first bitwise operation, wherein the subsets of bits of the first binary string are defined at the hashing engine such that each subset is unique and comprises approximately half of the bits of the first binary string; and a combination unit arranged to combine the third binary string with the second binary string by a reversible operation so as to form a binary output string for use as at least part of a cache address in a cache memory.

CACHE MEMORY, MEMORY SYSTEM INCLUDING THE SAME AND OPERATING METHOD THEREOF
20220374364 · 2022-11-24 ·

A cache memory includes a first cache area corresponding to even addresses, and a second cache area corresponding to odd addresses, wherein each of the first and second cache areas includes a plurality of cache sets, and each cache set includes a data set field suitable for storing data corresponding to an address among the even and odd addresses, and a pair field suitable for storing information on a location where data corresponding to an adjacent address which is adjacent to an address corresponding to the stored data is stored.