Patent classifications
G06F9/3858
Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.
Generation and use of memory access instruction order encodings
Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.
THREAD SWITCHING IN MICROPROCESSOR WITHOUT FULL SAVE AND RESTORE OF REGISTER FILE
Certain embodiments of the present disclosure support a method and apparatus for efficient multithreading on a single core microprocessor. Thread switching in the single core microprocessor presented herein is based on a reserved space in a memory allocated to each thread for storing and restoring of registers in a register file. The thread switching is achieved without full save and restore of the register file, and only those registers referenced in the memory are saved and restored during thread switching.
Draining operation to cause store data to be written to persistent memory
An apparatus comprises a write buffer to buffer store requests issued by the processing circuitry, prior to the store data being written to at least one cache. Draining circuitry detects a draining trigger event having potential to cause loss of state stored in the at least one cache. In response to the draining trigger event, the draining circuitry performs a draining operation to identify whether the write buffer buffers any committed store requests requiring persistence, and when the write buffer buffers at least one committed store request requiring persistence, to cause the store data associated with the at least one committed store request to be written to persistent memory. This helps to eliminate barrier instructions from software, simplifying persistent programming and improving performance.
USER MODE EVENT HANDLING
A method includes asserting a field of an event flag mask register configured to inhibit an event handler. The method also includes, responsive to an event that corresponds to the field of the event flag mask register being triggered: asserting a field of an event flag register associated with the event; and based the field in the event flag register being asserted, taking an action by a task being executed by the data processor core.
METHOD AND APPARATUS FOR REORDERING IN A NON-UNIFORM COMPUTE DEVICE
A data processing apparatus includes a multi-level memory system, one or more first processing unit coupled to the memory system at a first level and one or more second processing units each coupled to the memory system at a second level. A first reorder buffer maintains data order during execution of instructions by the first and second processing units and a second reorder buffer maintains data order during execution of the instructions by an associated second processing unit. An entry in the first reorder buffer is configured, dependent upon an indicator bit, as an entry for a single instruction or a pointer to an entry in the second reorder buffer. An entry in the second reorder buffer includes instruction block start and end addresses and indicators of input and output register. Instructions are released to a processing unit when all inputs, as indicated by the reorder buffers, are available.
METHOD AND APPARATUS FOR SCHEDULING IN A NON-UNIFORM COMPUTE DEVICE
A data processing apparatus, and method of operation thereof, for executing instructions. The apparatus includes one or more host processors, each having a first processing unit, and a multi-level memory system. One or more levels of the memory system are tightly coupled to a corresponding second processing unit. At least one of the host processors includes an instruction scheduler that routes instructions selectively to at least one of the first and second processing units, dependent upon the availability of the processing units and the location, within the memory system, of data to be used when executing the instructions.
Systems, methods, and apparatuses to control CPU speculation for the prevention of side-channel attacks
Embodiments of instructions are detailed herein including one or more of 1) a branch fence instruction, prefix, or variants (BFENCE); 2) a predictor fence instruction, prefix, or variants (PFENCE); 3) an exception fence instruction, prefix, or variants (EFENCE); 4) an address computation fence instruction, prefix, or variants (AFENCE); 5) a register fence instruction, prefix, or variants (RFENCE); and, additionally, modes that apply the above semantics to some or all ordinary instructions.
MANAGING AN EFFECTIVE ADDRESS TABLE IN A MULTI-SLICE PROCESSOR
Methods and apparatus for managing an effective address table (EAT) in a multi-slice processor including receiving, from an instruction sequence unit, a next-to-complete instruction tag (ITAG); obtaining, from the EAT, a first ITAG from a tail-plus-one EAT row, wherein the EAT comprises a tail EAT row that precedes the tail-plus-one EAT row; determining, based on a comparison of the next-to-complete ITAG and the first ITAG, that the tail EAT row has completed; and retiring the tail EAT row based on the determination.
DIRECT REGISTER RESTORE MECHANISM FOR DISTRIBUTED HISTORY BUFFERS
Techniques are disclosed for restoring register data in a processor. In one embodiment, a method includes receiving an instruction to flush one or more general purpose registers (GPRs) in a processor. The method also includes determining history buffer entries of a history buffer to be restored to the one or more GPRs. The method includes creating a mask vector that indicates which history buffer entries will be restored to the one or more GPRs. The method further includes restoring the indicated history buffer entries to the one or more GPRs. As each indicated history buffer entry is restored, the method includes updating the mask vector to indicate which history buffer entries have been restored.