G06F12/0879

Coherence-based cache-line Copy-on-Write

A method of performing a copy-on-write on a shared memory page is carried out by a device communicating with a processor via a coherence interconnect. The method includes: adding a page table entry so that a request to read a first cache line of the shared memory page includes a cache-line address of the shared memory page and a request to write to a second cache line of the shared memory page includes a cache-line address of a new memory page; in response to the request to write to the second cache line, storing new data of the second cache line in a second memory and associating the second cache-line address with the new data stored in the second memory; and in response to a request to read the second cache line, reading the new data of the second cache line from the second memory.

DYNAMICALLY COALESCING ATOMIC MEMORY OPERATIONS FOR MEMORY-LOCAL COMPUTING
20220414013 · 2022-12-29 ·

Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.

Destaging multiple cache slots in a single back-end track in a RAID subsystem

A data service layer running on a storage director node generates a request to destage host data from a plurality of cache slots in a single back-end track. The destage request includes pointers to addresses of the cache slots and indicates an order in which the host application data in the cache slots is to be included in the back-end track. A back-end redundant array of independent drives (RAID) subsystem running on a drive adapter is responsive to the request to calculate parity information using the host application data in the cache slots. The back-end RAID subsystem assembles the single back-end track comprising the host application data from the plurality of cache slots of the request, and destages the single back-end track to a non-volatile drive in a single back-end input-output (IO) operation.

DYNAMICALLY FORMATTED STORAGE ALLOCATION RECORD
20220391094 · 2022-12-08 ·

One or more non-transitory computer-readable media can store program instructions that, when executed by one or more processors, cause the one or more processors to perform steps of organizing storage as a set of storage regions, each storage region having a fixed size; and for each storage region, storing a storage allocation structure of the storage region formatted in a first format selected from a format set including at least two formats, determining a change of an allocation feature of the storage region, based on the allocation feature of the storage region, selecting, from the format set, a second format of the storage allocation structure, and reformatting the storage allocation structure in the second format.

Bandwidth boosted stacked memory

A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.

Bandwidth boosted stacked memory

A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.

Multi-block Cache Fetch Techniques
20220374359 · 2022-11-24 ·

Techniques are disclosed relating to multi-block fetches for cache misses. In some embodiments, cache tag circuitry maintains a tag value that is shared by multiple cache blocks. In response to a miss, the cache may initiate a fetch request to a next level cache or memory. Aggregation circuitry may aggregate multiple fetch requests for cache blocks that share the tag value and fetch circuitry may initiate a single multi-block fetch operation to the next level cache or memory that returns cache blocks for the aggregated multiple fetch requests. In various embodiments, disclosed techniques may improve performance (e.g., by reducing fetch bus transactions), reduce power consumption, or both, relative to traditional techniques.

Memory Cache with Partial Cache Line Valid States
20220365881 · 2022-11-17 ·

An apparatus includes a cache memory circuit configured to store a cache lines, and a cache controller circuit. The cache controller circuit is configured to receive a read request to an address associated with a portion of a cache line. In response to an indication that the portion of the cache line currently has at least a first sub-portion that is invalid and at least a second sub-portion that is modified relative to a version in a memory, the cache controller circuit is further configured to fetch values corresponding to the address from the memory, to generate an updated version of the portion of the cache line by using the fetched values to update the first sub-portion, but not the second sub-portion, of the portion of the cache line, and to generate a response to the read request that includes the updated version of the portion of the cache line.

Memory cache with partial cache line valid states
11586552 · 2023-02-21 · ·

An apparatus includes a cache memory circuit configured to store a cache lines, and a cache controller circuit. The cache controller circuit is configured to receive a read request to an address associated with a portion of a cache line. In response to an indication that the portion of the cache line currently has at least a first sub-portion that is invalid and at least a second sub-portion that is modified relative to a version in a memory, the cache controller circuit is further configured to fetch values corresponding to the address from the memory, to generate an updated version of the portion of the cache line by using the fetched values to update the first sub-portion, but not the second sub-portion, of the portion of the cache line, and to generate a response to the read request that includes the updated version of the portion of the cache line.

EFFICIENT COMPRESSED VERBATIM COPY

Compressed verbatim copy can enable more efficient copying of compressed data. In one example, a compressed verbatim copy method involves receiving a command to copy compressed data from a source address of the memory device to a destination address. In response to the receipt of the command, the method involves copying the compressed data in a compressed format from the source address to the destination address without first decompressing the data. A second source address and a second destination address of metadata for the compressed data is determined, and the metadata is copied from the second source address to the second destination address.