G06F12/0851

IN-MEMORY DATABASE (IMDB) ACCELERATION THROUGH NEAR DATA PROCESSING
20230027648 · 2023-01-26 ·

An accelerator is disclosed. The accelerator may include an on-chip memory to store a data from a database. The on-chip memory may include a first memory bank and a second memory bank. The first memory bank may store the data, which may include a first value and a second value. A computational engine may execute, in parallel, a command on the first value in the data and the command on the second value in the data in the on-chip memory. The on-chip memory may be configured to load a second data from the database into the second memory bank in parallel with the computation engine executing the command on the first value in the data and executing the command on the second value in the data.

Method and apparatus for using a storage system as main memory
11556469 · 2023-01-17 · ·

A data access system including a processor, multiple cache modules for the main memory, and a storage drive. The cache modules include a FLC controller and a main memory cache. The multiple cache modules function as main memory. The processor sends read/write requests (with physical address) to the cache module. The cache module includes two or more stages with each stage including a FLC controller and DRAM (with associated controller). If the first stage FLC module does not include the physical address, the request is forwarded to a second stage FLC module. If the second stage FLC module does not include the physical address, the request is forwarded to the storage drive, a partition reserved for main memory. The first stage FLC module has high speed, lower power operation while the second stage FLC is a low-cost implementation. Multiple FLC modules may connect to the processor in parallel.

Transparent interleaving of compressed cache lines

Low latency in a non-uniform cache access (“NUCA”) cache in a computing environment is provided. A first compressed cache line is interleaved with a second compressed cache line into a single cache line of the NUCA cache, where data of the first compressed cache line is stored in one or more even sectors in the single cache line and stored in zero or more odd sectors in the single cache line after the data fills the one or more even sectors, and data of the second compressed cache line is stored in the one or more odd sectors in the single cache line and stored in zero or more even sectors in the single cache line after the data fills the one or more odd sectors.

LAST-LEVEL CACHE TOPOLOGY FOR VIRTUAL MACHINES
20230036017 · 2023-02-02 ·

An example method of determining size of virtual last-level cache (LLC) exposed to a virtual machine (VM) supported by a hypervisor executing on a host computer includes: obtaining, by the hypervisor, a host topology of the host computer, the host topology including a number of LLCs in a central processing unit (CPU) of the host computer and a host LLC size being a size of each of the LLCs in the CPU; obtaining, by the hypervisor, a virtual socket size for a virtual socket presented to the VM by the hypervisor and a virtual non-uniform memory access (NUMA) node size presented to the VM by the hypervisor; determining, by the hypervisor, a virtual LLC size for the VM based on the host topology, the virtual socket size, the virtual NUMA node size, and a plurality of constraints; and presenting, to the VM, the virtual LLC size in processor feature discovery information.

LOOK-UP TABLE CONTAINING PROCESSOR-IN-MEMORY CLUSTER FOR DATA-INTENSIVE APPLICATIONS

A processing element includes a PIM cluster configured to read data from and write data to an adjacent DRAM subarray, wherein the PIM cluster has a plurality of processing cores, each processing core of the plurality of processing cores containing a look-up table, and a router connected to each processing core, wherein the router is configured to communicate data among each processing core; and a controller unit configured to communicate with the router, wherein the controller unit contains an executable program of operational decomposition algorithms. The look-up tables can be programmable. A DRAM chip including a plurality of DRAM banks, each DRAM bank having a plurality of interleaved DRAM subarrays and a plurality of the PIM clusters configured to read data from and write data to an adjacent DRAM subarray is disclosed.

MEMORY SYSTEM EXECUTING BACKGROUND OPERATION USING EXTERNAL DEVICE AND OPERATION METHOD THEREOF
20230152996 · 2023-05-18 ·

Embodiments of the present disclosure relate to a memory system and operation method thereof. According to embodiments of the present disclosure, the memory system may include i) a memory device including a plurality of memory blocks, wherein each of the plurality of memory blocks include a plurality of pages; and ii) a memory controller configured to determine a first super memory block among a plurality of super memory blocks, wherein each of the plurality of super memory blocks includes one or more of the plurality of memory blocks, set a lock to prevent a background operation from being executed for the first super memory block, and transmit data stored in the first super memory block to an external device.

Virtual network pre-arbitration for deadlock avoidance and enhanced performance

A device includes a data path, a first interface configured to receive a first memory access request from a first peripheral device, and a second interface configured to receive a second memory access request from a second peripheral device. The device further includes an arbiter circuit configured to, in a first clock cycle, a pre-arbitration winner between a first memory access request and a second memory access request based on a first number of credits allocated to a first destination device and a second number of credits allocated to a second destination device. The arbiter circuit is further configured to, in a second clock cycle select a final arbitration winner from among the pre-arbitration winner and a subsequent memory access request based on a comparison of a priority of the pre-arbitration winner and a priority of the subsequent memory access request.

Create page locality in cache controller cache allocation

Integrated circuits are provided which create page locality in cache controllers that allocate entries to set-associative cache, which includes data storage for a plurality of Sets of Ways. A plurality of cache controllers may be interleaved with a processor and device(s), and allocate to any pages in the cache. A cache controller may select a Way from a Set to which to allocate new entries in the set-associative cache and bias selection of the Way according to a plurality of upper address bits (or other function). These bits may be identical at the cache controller during sequential memory transactions. A processor may determine the bias centrally, and inform the cache controllers of the selected Set and Way. Other functions, algorithms or approaches may be chosen to influence bias of Way selection, such as based on analysis of metadata belonging to cache controllers used for making Way allocation selections.

Techniques for storing data and tags in different memory arrays
11681632 · 2023-06-20 · ·

A memory controller includes logic circuitry to generate a first data address identifying a location in a first external memory array for storing first data, a first tag address identifying a location in a second external memory array for storing a first tag, a second data address identifying a location in the second external memory array for storing second data, and a second tag address identifying a location in the first external memory array for storing a second tag. The memory controller includes an interface that transfers the first data address and the first tag address for a first set of memory operations in the first and the second external memory arrays. The interface transfers the second data address and the second tag address for a second set of memory operations in the first and the second external memory arrays.

Dynamic memory address encoding
11681622 · 2023-06-20 · ·

Described herein is a memory architecture that is configured to dynamically determine an address encoding to use to encode multi-dimensional data such as multi-coordinate data in a manner that provides a coordinate bias corresponding to a current memory access pattern. The address encoding may be dynamically generated in response to receiving a memory access request or may be selected from a set of preconfigured address encodings. The dynamically generated or selected address encoding may apply an interleaving technique to bit representations of coordinate values to obtain an encoded memory address. The interleaving technique may interleave a greater number of bits from the bit representation corresponding to the coordinate direction in which a coordinate bias is desired than from bit representations corresponding to other coordinate directions.