G06F2212/283

STORAGE SYSTEM, METHOD, AND APPARATUS FOR FAST IO ON PCIE DEVICES
20170344510 · 2017-11-30 ·

Embodiments of systems and methods for fast input/output (IO) on PCIE devices are described. Such methods include receiving an IO request from a user or application, the IO request comprising instructions for communicating data with a host system, the host system comprising a processing device and a memory device, analyzing information from the IO request in an IO block analyzer to select one of a plurality of communication paths for communicating the data with the host system, defining a routing instruction in a transfer routing information transmitter in response to the selected communication path, communicating the routing instruction in a Transaction Layer Packet (TLP) to an integrated IO (IIO) module of the host system routing the data from the peripheral device to either the processing device or the memory device according to the routing instruction with a data transfer router.

INTERLEAVED CACHE CONTROLLERS WITH SHARED METADATA AND RELATED DEVICES AND SYSTEMS
20170329711 · 2017-11-16 · ·

Interleaved cache controllers with shared metadata are disclosed and described. A memory system may comprise a plurality of cache controllers and a metadata store interconnected by a metadata store fabric. The metadata store receives information from at least one of the plurality of cache controllers, a portion of which is stored as shared distributed metadata. The metadata store provides shared access of the shared distributed metadata hosted to the plurality of cache controllers

DATA STORAGE SYSTEM, PROCESS AND COMPUTER PROGRAM FOR SUCH DATA STORAGE SYSTEM FOR REDUCING READ AND WRITE AMPLIFICATIONS

The present disclosure relates to a data storage system, and processes and computer programs for such data storage system, for example including processing of: managing one or more metadata tree structures for storing data to one or more storage devices of the data storage system in units of blocks, each metadata tree structure including a root node pointing directly and/or indirectly to blocks, and a leaf tree level having one or more direct nodes pointing to blocks, and optionally including one or more intermediate tree levels having one or more indirect nodes pointing to indirect nodes and/or direct nodes of the respective metadata tree structure; maintaining the root node and/or nodes of at least one tree level of each of at least one metadata structure in a cache memory; and managing I/O access to data based on the one or more metadata structures.

Hardware accelerators and access methods thereof

A processing system includes a cache, a host memory, a CPU and a hardware accelerator. The CPU accesses the cache and the host memory and generates at least one instruction. The hardware accelerator operates in a non-temporal access mode or a temporal access mode according to the access behavior of the instruction. The hardware accelerator accesses the host memory through an accelerator interface when the hardware accelerator operates in the non-temporal access mode, and accesses the cache through the accelerator interface when the hardware accelerator operates in the temporal access mode.

Replacement policies for a hybrid hierarchical cache

A hybrid hierarchical cache is implemented at the same level in the access pipeline, to get the faster access behavior of a smaller cache and, at the same time, a higher hit rate at lower power for a larger cache, in some embodiments. A split cache at the same level in the access pipeline includes two caches that work together. In the hybrid, split, low level cache (e.g., L1) evictions are coordinated locally between the two L1 portions, and on a miss to both L1 portions, a line is allocated from a larger L2 cache to the smallest L1 cache.

Achieving guaranteed application performance using transactional I/O scheduling for SSD storage by interleaving and splitting read/write I/Os with required latency configurations
11263089 · 2022-03-01 · ·

Embodiments are described for prioritizing input/output (I/O) operations dispatched to a solid-state device (SSD) cache in a network, by defining a maximum write I/O operation size for writing data to the SSD cache, splitting large write I/O operations into smaller write I/O operations, each with a size less than the maximum write I/O operation size, interleaving cache read I/O operations in between the smaller write I/O operations, and performing the cache read I/O operations and the smaller write I/O operations in an order created by the interleaving. The network may comprise a deduplication backup system storing data to storage media including the SSD cache.

Apparatus and method for considering spatial locality in loading data elements for execution

In one embodiment of the invention, a processor comprising an upper level cache and at least one processor core. The at least one processor core includes one or more registers and a plurality of instruction processing stages: a decode unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor; an execution unit to load the plurality of data elements to the one or more registers of the processor, without loading data elements spatially adjacent to the plurality of data elements or the plurality of data elements in an upper level cache.

DRAINING A WRITE QUEUE BASED ON INFORMATION FROM A READ QUEUE
20170315914 · 2017-11-02 ·

A method to access a memory chip having memory banks includes processing read requests in a read queue, and when a write queue is filled beyond a high watermark, stopping the processing of the read requests in the read queue and draining the write queue until the write queue is under a low watermark. Draining the write queue include issuing write requests in an order based on information in the read queue. When the write queue is under the low watermark, the method includes stopping the draining of the write queue and again processing the read requests in the read queue.

Grouping flash storage blocks based on robustness for cache program operations and regular program operations

Systems, apparatus and methods are provided for performing program operations in a non-volatile storage system. In one embodiment, there is provided a method that may comprise categorizing active storage blocks of a non-volatile storage device into a robust group and a less-robust group based on a number of factors including page error count, program time and number of Program/Erase (P/E) cycles; determining that a cache program operation needs to be performed; selecting a first storage block from the robust group to perform the cache program operation; determining that a regular program operation needs to be performed; and selecting a second storage block from the less-robust group to perform the regular program operation.

ARITHMETIC PROCESSING DEVICE AND CONTROL METHOD THEREOF
20170308483 · 2017-10-26 · ·

An arithmetic processing device includes a core, and a first control circuit that controls a memory request issued by the processing core. The first control circuit includes a miss access control unit with input entries that assigns an input entry to the memory request to control a process of the memory request, and a control pipeline circuit that performs a cache hit determination and issues a memory request to the miss access control unit in a case of cache miss. The control pipeline circuit includes a speculative request control unit that issues a speculative memory request to the miss access control unit before the cache hit determination is performed, cancels the issued speculative memory request in a case of cache hit, and more suppresses issuing the speculative memory request when the number of input entries assigned to the canceled speculative memory request increases.