Patent classifications
G06F9/3816
Sequence alignment method of vector processor
A sequence alignment method that may be performed by a vector processor is may include loading a sequence that is an instance of vector data including a plurality of elements, dividing the sequence into two groups, aligning respective elements of the groups to generate a sequence of sorted elements according to a single instruction multiple data mode, and iteratively performing an alignment operation based on a determination that each group in the sequence of sorted elements includes more than one element of the plurality of elements. Each iteration may include dividing each group to form new groups and aligning respective elements of each pair of adjacent new groups to generate a new sequence of sorted elements. The new sequence of a current iteration of the alignment operation may be transmitted as a data output, based on a determination that each new group does not include more than one element.
PMEM CACHE RDMA SECURITY
Techniques are described for providing one or more clients with direct access to cached data blocks within a persistent memory cache on a storage server. In an embodiment, a storage server maintains a persistent memory cache comprising a plurality of cache lines, each of which represent an allocation unit of block-based storage. The storage server maintains an RDMA table that include a plurality of table entries, each of which maps a respective client to one or more cache lines and a remote access key. An RDMA access request to access a particular cache line is received from a storage server client. The storage server identifies access credentials for the client and determines whether the client has permission to perform the RDMA access on the particular cache line. Upon determining that the client has permissions, the cache line is accessed from the persistent memory cache and sent to the storage server client.
Compressed instruction format
A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
METHOD AND APPARATUS FOR USING A STORAGE SYSTEM AS MAIN MEMORY
A data access system including a processor, multiple cache modules for the main memory, and a storage drive. The cache modules include a FLC controller and a main memory cache. The multiple cache modules function as main memory. The processor sends read/write requests (with physical address) to the cache module. The cache module includes two or more stages with each stage including a FLC controller and DRAM (with associated controller). If the first stage FLC module does not include the physical address, the request is forwarded to a second stage FLC module. If the second stage FLC module does not include the physical address, the request is forwarded to the storage drive, a partition reserved for main memory. The first stage FLC module has high speed, lower power operation while the second stage FLC is a low-cost implementation. Multiple FLC modules may connect to the processor in parallel.
Quality of service dirty line tracking
Systems, apparatuses, and methods for generating a measurement of write memory bandwidth are disclosed. A control unit monitors writes to a cache hierarchy. If a write to a cache line is a first time that the cache line is being modified since entering the cache hierarchy, then the control unit increments a write memory bandwidth counter. Otherwise, if the write is to a cache line that has already been modified since entering the cache hierarchy, then the write memory bandwidth counter is not incremented. The first write to a cache line is a proxy for write memory bandwidth since this will eventually cause a write to memory. The control unit uses the value of the write memory bandwidth counter to generate a measurement of the write memory bandwidth. Also, the control unit can maintain multiple counters for different thread classes to calculate the write memory bandwidth per thread class.
Way predictor and enable logic for instruction tightly-coupled memory and instruction cache
Disclosed herein are systems and method for instruction tightly-coupled memory (iTIM) and instruction cache (iCache) access prediction. A processor may use a predictor to enable access to the iTIM or the iCache and a particular way (a memory structure) based on a location state and program counter value. The predictor may determine whether to stay in an enabled memory structure, move to and enable a different memory structure, or move to and enable both memory structures. Stay and move predictions may be based on whether a memory structure boundary crossing has occurred due to sequential instruction processing, branch or jump instruction processing, branch resolution, and cache miss processing. The program counter and a location state indicator may use feedback and be updated each instruction-fetch cycle to determine which memory structure(s) needs to be enabled for the next instruction fetch.
WRITING ZERO DATA
Apparatuses, methods of operating apparatuses, interconnects for connecting apparatuses to one another, and methods of operating the interconnects are disclosed. A master apparatus can issue an individual all-zero-data write transaction specifying a data storage location to the interconnect, which conveys the individual all-zero-data write transaction to a target device which writes all-zero-data at the data storage location. No write data is conveyed with the individual all-zero-data write transaction, so that the individual all-zero-data write transaction may be used to clear the data storage location without adding to congestion of a write data channel in the interconnect.
Apparatus and Method for Store Pairing with Reduced Hardware Requirements
An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.
Direct swap caching with zero line optimizations
Systems and methods related to direct swap caching with zero line optimizations are described. A method for managing a system having a near memory and a far memory comprises receiving a request from a requestor to read a block of data that is either stored in the near memory or the far memory. The method includes analyzing a metadata portion associated with the block of data, the metadata portion comprising: both (1) information concerning whether the near memory contains the block of data or whether the far memory contains the block of data and (2) information concerning whether a data portion associated with the block of data is all zeros. The method further includes instead of retrieving the data portion from the far memory, synthesizing the data portion corresponding to the block of data to generate a synthesized data portion and transmitting the synthesized data portion to the requestor.
Apparatus and method for arbitrating access to a set of resources
An apparatus and method are provided for arbitrating access to a set of resources that are to be accessed in response to requests received at an interface. Arbitration circuitry arbitrates amongst the requests received at the interface in order to select, in each arbitration cycle, at least one next request to be processed. Each request identifies an access operation to be performed in response to the request, the access operation being selected from a group of access operations. Further, each access operation in the group has an associated scheduling pattern identifying timing of access to the resources in the set when performing that access operation. In response to a given request being selected by the arbitration circuitry, access control circuitry controls access to the set of resources in accordance with the associated scheduling pattern for the access operation identified by that request. Ordering circuitry determines, for different combinations of requests, an associated ordering of those requests, where the associated ordering is determined taking into account the associated scheduling pattern of each access operation in the group. The arbitration circuitry may employ a bandwidth aware arbitration scheme where a multi-arbitration operation is iteratively performed. Each performance of the multi-arbitration operation comprises sampling the requests currently received at the interface, and employing the ordering circuitry to determine, based on the sampled requests, an order in which the sampled requests should be selected. Then, during each arbitration cycle, at least one next request to be processed is selected from amongst the sampled requests based on the determined order, until each sampled request has been selected.