G06F12/0884

Spatial cache
11687465 · 2023-06-27 · ·

A cache includes a p-by-q array of memory units; a row addressing unit; and a column addressing unit. Each memory unit has an m-by-n array of memory cells. The column addressing unit has, for each memory unit, m n-to-one multiplexers, one associated with each of the m rows of the memory unit, wherein each n-to-one multiplexer has an input coupled to each of the n memory cells associated with the row associated with that multiplexer. The row addressing unit has, for each memory unit, n m-to-one multiplexers, one associated with each of the n columns of the memory unit, wherein each m-to-one multiplexer has an input coupled to each of the m memory cells associated with the column associated with that multiplexer. The row addressing unit and column addressing unit support reading and/or writing of the array of memory units, e.g. using virtual or physical addresses.

Methods and apparatus to facilitate an atomic operation and/or a histogram operation in cache pipeline

Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed. An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.

Methods and apparatus to facilitate an atomic operation and/or a histogram operation in cache pipeline

Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed. An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.

Coherent accelerator fabric controller

A fabric controller is provided for a coherent accelerator fabric. The coherent accelerator fabric includes a host interconnect, a memory interconnect, and an accelerator interconnect. The host interconnect communicatively couples to a host device. The memory interconnect communicatively couples to an accelerator memory. The accelerator interconnect communicatively couples to an accelerator having a last-level cache (LLC). An LLC controller is provided that is configured to provide a bias check for memory access operations on the fabric.

Coherent accelerator fabric controller

A fabric controller is provided for a coherent accelerator fabric. The coherent accelerator fabric includes a host interconnect, a memory interconnect, and an accelerator interconnect. The host interconnect communicatively couples to a host device. The memory interconnect communicatively couples to an accelerator memory. The accelerator interconnect communicatively couples to an accelerator having a last-level cache (LLC). An LLC controller is provided that is configured to provide a bias check for memory access operations on the fabric.

Non-volatile memory controller cache architecture with support for separation of data streams

A system according to one embodiment includes non-volatile memory, and a non-volatile memory controller having a cache. An architecture of the cache supports separation of data streams, and the cache architecture supports parallel writes to different non-volatile memory channels. Additionally, the cache architecture supports pipelining of the parallel writes to different non-volatile memory planes. Furthermore, the non-volatile memory controller is configured to perform a direct memory lookup in the cache based on a physical block address. Other systems, methods, and computer program products are described in additional embodiments.

Non-volatile memory controller cache architecture with support for separation of data streams

A system according to one embodiment includes non-volatile memory, and a non-volatile memory controller having a cache. An architecture of the cache supports separation of data streams, and the cache architecture supports parallel writes to different non-volatile memory channels. Additionally, the cache architecture supports pipelining of the parallel writes to different non-volatile memory planes. Furthermore, the non-volatile memory controller is configured to perform a direct memory lookup in the cache based on a physical block address. Other systems, methods, and computer program products are described in additional embodiments.

SPATIAL CACHE
20210406197 · 2021-12-30 · ·

A cache includes a p-by-q array of memory units; a row addressing unit; and a column addressing unit. Each memory unit has an m-by-n array of memory cells. The column addressing unit has, for each memory unit, m n-to-one multiplexers, one associated with each of the m rows of the memory unit, wherein each n-to-one multiplexer has an input coupled to each of the n memory cells associated with the row associated with that multiplexer. The row addressing unit has, for each memory unit, n m-to-one multiplexers, one associated with each of the n columns of the memory unit, wherein each m-to-one multiplexer has an input coupled to each of the m memory cells associated with the column associated with that multiplexer. The row addressing unit and column addressing unit support reading and/or writing of the array of memory units, e.g. using virtual or physical addresses.

SPATIAL CACHE
20210406197 · 2021-12-30 · ·

A cache includes a p-by-q array of memory units; a row addressing unit; and a column addressing unit. Each memory unit has an m-by-n array of memory cells. The column addressing unit has, for each memory unit, m n-to-one multiplexers, one associated with each of the m rows of the memory unit, wherein each n-to-one multiplexer has an input coupled to each of the n memory cells associated with the row associated with that multiplexer. The row addressing unit has, for each memory unit, n m-to-one multiplexers, one associated with each of the n columns of the memory unit, wherein each m-to-one multiplexer has an input coupled to each of the m memory cells associated with the column associated with that multiplexer. The row addressing unit and column addressing unit support reading and/or writing of the array of memory units, e.g. using virtual or physical addresses.

Increased parallelization efficiency in tiering environments

A computer-implemented method, according to one embodiment, includes: identifying block addresses which are associated with a given object, and combining the block addresses to a first set in response to determining that at least one token is currently issued on one or more of the identified block addresses. A first portion of the block addresses is transitioned to a second set, where the first portion includes ones of the block addresses determined as having a token currently issued thereon. Moreover, a second portion of the block addresses is divided into equal chunks, where the second portion includes the block addresses remaining in the first set. The chunks in the first set are allocated across two or more parallelization units. Furthermore, the block addresses in the second set are divided into equal chunks, and the chunks in the second set are allocated to at least one dedicated parallelization unit.