Patent classifications
G06F9/3816
Variable-length instruction buffer management
A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.
PMEM cache RDMA security
Techniques are described for providing one or more clients with direct access to cached data blocks within a persistent memory cache on a storage server. In an embodiment, a storage server maintains a persistent memory cache comprising a plurality of cache lines, each of which represent an allocation unit of block-based storage. The storage server maintains an RDMA table that include a plurality of table entries, each of which maps a respective client to one or more cache lines and a remote access key. An RDMA access request to access a particular cache line is received from a storage server client. The storage server identifies access credentials for the client and determines whether the client has permission to perform the RDMA access on the particular cache line. Upon determining that the client has permissions, the cache line is accessed from the persistent memory cache and sent to the storage server client.
Method and apparatus for using a storage system as main memory
A data access system including a processor, multiple cache modules for the main memory, and a storage drive. The cache modules include a FLC controller and a main memory cache. The multiple cache modules function as main memory. The processor sends read/write requests (with physical address) to the cache module. The cache module includes two or more stages with each stage including a FLC controller and DRAM (with associated controller). If the first stage FLC module does not include the physical address, the request is forwarded to a second stage FLC module. If the second stage FLC module does not include the physical address, the request is forwarded to the storage drive, a partition reserved for main memory. The first stage FLC module has high speed, lower power operation while the second stage FLC is a low-cost implementation. Multiple FLC modules may connect to the processor in parallel.
Apparatus and method for store pairing with reduced hardware requirements
An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.
Prefetch-Adaptive Intelligent Cache Replacement Policy for High Performance
The invention discloses a prefetch-adaptive intelligent cache replacement policy for high performance, in the presence of hardware prefetching, a prefetch request and a demand request are distinguished, a prefetch predictor based on an ISVM (Integer Support Vector Machine) is used for carrying out re-reference interval prediction on a cache line of prefetching access loading, and a demand predictor based on an ISVM is utilized to carry out re-reference interval prediction on a cache line of demand access loading. A PC of a current access load instruction and PCs of past load instructions in an access historical record are input, different ISVM predictors are designed for prefetch and demand requests, reuse prediction is performed on a loaded cache line by taking a request type as granularity, the accuracy of cache line reuse prediction in the presence of prefetching is improved, and performance benefits from hardware prefetching and cache replacement is better fused.
Widening memory access to an aligned address for unaligned memory operations
Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.
Pipelined read-modify-write operations in cache memory
In described examples, a processor system includes a processor core that generates memory write requests, a cache memory, and a memory pipeline of the cache memory. The memory pipeline has a holding buffer, an anchor stage, and an RMW pipeline. The anchor stage determines whether a data payload of a write request corresponds to a partial write. If so, the data payload is written to the holding buffer and conforming data is read from a corresponding cache memory address to merge with the data payload. The RMW pipeline has a merge stage and a syndrome generation stage. The merge stage merges the data payload in the holding buffer with the conforming data to make merged data. The syndrome generation stage generates an ECC syndrome using the merged data. The memory pipeline writes the data payload and ECC syndrome to the cache memory.
REGION AWARE DELTA PREFETCHER
An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.
Way predictor and enable logic for instruction tightly-coupled memory and instruction cache
Disclosed herein are systems and method for instruction tightly-coupled memory (iTIM) and instruction cache (iCache) access prediction. A processor may use a predictor to enable access to the iTIM or the iCache and a particular way (a memory structure) based on a location state and program counter value. The predictor may determine whether to stay in an enabled memory structure, move to and enable a different memory structure, or move to and enable both memory structures. Stay and move predictions may be based on whether a memory structure boundary crossing has occurred due to sequential instruction processing, branch or jump instruction processing, branch resolution, and cache miss processing. The program counter and a location state indicator may use feedback and be updated each instruction-fetch cycle to determine which memory structure(s) needs to be enabled for the next instruction fetch.
CONTROL FLOW PREDICTION
A data processing apparatus is provided that includes bimodal control flow prediction circuitry for performing a prediction of whether a conditional control flow instruction will be taken. Storage circuitry stores, in association with the control flow instruction, a stored state of the data processing apparatus and reversal circuitry reverses the prediction in dependence on the stored state of the data processing apparatus corresponding with a current state of the data processing apparatus when execution of the control flow instruction is to be performed.