Patent classifications
G06F9/3842
SINGLE-THREAD SPECULATIVE MULTI-THREADING
A processor includes a pipeline and control circuitry. The pipeline is configured to process instructions of program code and includes one or more fetch units. The control circuitry is configured to predict at run-time one or more future flow-control traces to be traversed in the program code, to define, based on the predicted flow-control traces, two or more regions of the program code from which instructions are to be fetched, wherein the number of regions is greater than the number of fetch units, and to instruct the pipeline to fetch instructions alternately from the two or more regions of the program code using the one or more fetch units, and to process the fetched instructions.
Indirect control flow instructions and inhibiting data value speculation
There is provided an apparatus that includes input circuitry to receive input data and output circuitry to output a sequence of instructions to be executed by data processing circuitry. Generation circuitry performs a generation process to generate the sequence of instructions using the input data. The sequence of instructions comprises an indirect control flow instruction having a field that indicates where a target of the indirect control flow instruction is stored. The generation process causes at least one of the instructions in the sequence of instructions to store a state of control flow speculation after execution of the indirect control flow instruction. The at least one of the instructions in the sequence of instructions that stores the state of control flow speculation is inhibited from being subject to data value speculation by the data processing circuitry.
Optimizing performance for context-dependent instructions
A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction.
Systems, apparatuses, and methods for data speculation execution
Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address.
CONTROLLING USE OF DATA DETERMINED BY A RESOLVE-PENDING SPECULATIVE OPERATION
A data processing apparatus is provided which controls the use of data in respect of a further operation. The data processing apparatus identifies whether data is trusted or untrusted by identifying whether or not the data was determined by a speculatively executed resolve-pending operation. A permission control unit is also provided to control how the data can be used in respect of a further operation according to a security policy while the speculatively executed operation is still resolve-pending.
INDEXING ENTRIES OF A STORAGE STRUCTURE SHARED BETWEEN MULTIPLE THREADS
An apparatus has processing circuitry for processing instructions from multiple threads. A storage structure is shared between the threads and has a number of entries. Indexing circuitry generates a target index value identifying an entry of the storage structure to be accessed in response to a request from the processing circuitry specifying a requested index value corresponding to information to be accessed from the storage structure. The indexing circuitry generates the target index value as a function of the requested index value and a key value selected depending on which of the threads trigger the request. The key value for at least one of the threads is updated from time to time.
Efficient load value prediction
Certain aspects of the present disclosure provide techniques for training load value predictors, comprising: determining if a prediction has been made by one or more of a plurality of load value predictors; determining a misprediction has been made by one or more load value predictors of the plurality of load value predictors; training each of the one or more load value predictors that made the misprediction; and resetting a confidence value associated with each of the one or more load value predictors that made the misprediction.
Managing potentially invalid results during runahead
Embodiments related to managing potentially invalid results generated/obtained by a microprocessor during runahead are provided. In one example, a method for operating a microprocessor includes causing the microprocessor to enter runahead upon detection of a runahead event. The example method also includes, during runahead, determining that an operation associated with an instruction referencing a storage location would produce a potentially invalid result based on a value of an architectural poison bit associated with the storage location and performing a different operation in response.
SYSTEM-ON-CHIP FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING
An example system for speculative execution event counter checkpointing and restoring may include a plurality of symmetric cores, at least one of the symmetric cores to simultaneously process a plurality of threads and to perform out-of-order instruction processing for the plurality of threads; at least one shared cache circuit to be shared among two or more the of symmetric cores. The system may further include a memory controller to couple the symmetric cores to a system memory and a data communication interface to couple one or more of the cores to input/output devices. The system may further include event counter circuitry comprising: a plurality of event counters including programmable event counters and fixed event counters and one or more configuration registers to store configuration data to specify an event type to be counted by the programmable event counters, wherein at least one of the one or more configuration registers is to store configuration data for a plurality of the programmable event counters. The system may further include transactional memory circuitry to process transactional memory operations including load operations and store operations, the transactional memory circuitry to process a transaction begin instruction to indicate a start of a transactional execution region of a program, a transaction end instruction to indicate an end of the transactional execution region, and a transaction abort instruction to abort processing of the transactional execution region. The system may further include transaction checkpoint circuitry to store a processor state at the start of the transactional execution region of the program, the processor state including values of one or more of the event counters. The system may further include lock elision circuitry to cause critical sections of the program to execute as transactions on multiple threads without acquiring a lock, the lock elision circuitry to cause the critical sections to be re-executed non-speculatively using one or more locks in response to detecting a transaction failure.
CACHE SYSTEMS AND CIRCUITS FOR SYNCING CACHES OR CACHE SETS
A cache system, having a first cache, a second cache, and a logic circuit coupled to control the first cache and the second cache according to an execution type of a processor. When an execution type of a processor is a first type indicating non-speculative execution of instructions and the first cache is configured to service commands from a command bus for accessing a memory system, the logic circuit is configured to copy a portion of content cached in the first cache to the second cache. The cache system can include a configurable data bit. The logic circuit can be coupled to control the caches according to the bit. Alternatively, the caches can include cache sets. The caches can also include registers associated with the cache sets respectively. The logic circuit can be coupled to control the cache sets according to the registers.