Patent classifications
G06F2212/62
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
Cache affinity and processor utilization technique
A cache affinity and processor utilization technique efficiently load balances work in a storage input/output (I/O) stack among a plurality of processors and associated processor cores of a node. The storage I/O stack employs one or more non-blocking messaging kernel (MK) threads that execute non-blocking message handlers (i.e., non-blocking services). The technique load balances work between the processor cores sharing a last level cache (LLC) (i.e., intra-LLC processor load balancing), and load balances work between the processors having separate LLCs (i.e., inter-LLC processor load balancing). The technique may allocate a predetermined number of logical processors for use by an MK scheduler to schedule the non-blocking services within the storage I/O stack, as well as allocate a remaining number of logical processors for use by blocking services, e.g., scheduled by an operating system kernel scheduler.
CACHING FOR UNIQUE COMBINATION READS IN A DISPERSED STORAGE NETWORK
A method includes receiving a first access request that indicates a first data object stored as encoded slices in a plurality of storage units. A first desired slice set is selected, based on the requesting module, that includes a first subset of encoded slices of the first data object. Absent slice data is generated based on searching a local cache, indicating an encoded slice not present in the local cache. A read request to read the encoded slice indicated by the absent slice data from one of the storage units is transmitted. The encoded slice indicated by the absent slice data from the storage unit is received and the local cache is updated to include the encoded slice. The first data object is regenerated for transmission to the first requesting module by decoding the first subset of encoded slices in the first desired slice set.
DE-PRIORITIZING SPECULATIVE CODE LINES IN ON-CHIP CACHES
Methods and apparatus relating to de-prioritizing speculative code lines in on-chip caches are described. In an embodiment, logic circuitry determines whether a storage structure includes a reference to a code miss request prior to transmission of the code miss request to a shared cache. The logic circuitry causes de-prioritization of a code line, corresponding to the code miss request, in the shared cache in response to an absence of the reference in the storage structure. Other embodiments are also disclosed and claimed.
Synchronizing updates of page table status indicators and performing bulk operations
A synchronization capability to synchronize updates to page tables by forcing updates in cached entries to be made visible in memory (i.e., in in-memory page table entries). A synchronization instruction is used that ensures after the instruction has completed that updates to the cached entries that occurred prior to the synchronization instruction are made visible in memory. Synchronization may be used to facilitate memory management operations, such as bulk operations used to change a large section of memory to read-only, operations to manage a free list of memory pages, and/or operations associated with terminating processes.
METHOD AND APPARATUS FOR REORDERING IN A NON-UNIFORM COMPUTE DEVICE
A data processing apparatus includes a multi-level memory system, one or more first processing unit coupled to the memory system at a first level and one or more second processing units each coupled to the memory system at a second level. A first reorder buffer maintains data order during execution of instructions by the first and second processing units and a second reorder buffer maintains data order during execution of the instructions by an associated second processing unit. An entry in the first reorder buffer is configured, dependent upon an indicator bit, as an entry for a single instruction or a pointer to an entry in the second reorder buffer. An entry in the second reorder buffer includes instruction block start and end addresses and indicators of input and output register. Instructions are released to a processing unit when all inputs, as indicated by the reorder buffers, are available.
METHOD AND APPARATUS FOR SCHEDULING IN A NON-UNIFORM COMPUTE DEVICE
A data processing apparatus, and method of operation thereof, for executing instructions. The apparatus includes one or more host processors, each having a first processing unit, and a multi-level memory system. One or more levels of the memory system are tightly coupled to a corresponding second processing unit. At least one of the host processors includes an instruction scheduler that routes instructions selectively to at least one of the first and second processing units, dependent upon the availability of the processing units and the location, within the memory system, of data to be used when executing the instructions.
Methods and apparatus to facilitate an atomic operation and/or a histogram operation in cache pipeline
Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed. An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.
Interconnect delivery process
A method for enforcing data integrity in an RDMA data storage system includes flushing data write requests to a data storage device before sending an acknowledgment that the data write requests have been executed. An RDMA data storage system includes a node configured to flush data write requests to a data storage device before sending an acknowledgment that a data write request has been executed.
Multi-plane switching of non-volatile memory
A method includes transferring data out of a first buffer coupled to a first plane of a plurality of planes of a memory component, where the data was previously transferred from the first plane to the first buffer responsive to an access request to sense data stored in the plurality of planes of the memory component. The method further includes transferring, subsequent to transferring the data out of the first buffer and independently of a command from a processing device, data out of a second buffer coupled to a second plane of the plurality of planes of the memory component, where the data transferred out of the second buffer was previously transferred from the second plane to the second buffer responsive to the access request.