G06F9/38585

Hybrid tracking of transaction read and write sets

Embodiments of the invention relate to tracking processor transactional read and write sets, thereby eliminating speculative mispredictions. Both non-speculative read set and write set indications are maintained for a transaction. The indications are stored in cache. In addition, load and write queues of addresses are maintained. The load queue of addresses relates to speculative members of a read set and the write queue of addresses relates to speculating member of a write set. For a received read request, a transaction resolution process takes place, and a resolution is performed if an address match in the write queue is detected. Similarly, for a receive write request the transaction interference additionally checks the load queue and the non-speculative read set for the pending address.

Storing a processing state based on confidence in a predicted branch outcome and a number of recent state changes

A data processing apparatus is provided. It includes processing circuitry for speculatively executing a plurality of instructions. Storage circuitry stores a current state of the processing circuitry and a plurality of previous states of the processing circuitry. Execution of the plurality of instructions changes the current state of the processing circuitry. Flush circuitry replaces, in response to a miss-prediction, the current state of the processing circuitry with a replacement one of the plurality of previous states of the processing circuitry.

Graphics engine reset and recovery in a multiple graphics context execution environment

Methods, systems and apparatuses may provide for technology that triggers an idle state in a first command streamer in response to a request to reset a second command streamer that shares graphics hardware with the first command streamer. The technology may also determine an event type associated with the request and conduct the request based on the event type.

ARITHMETIC PROCESSING DEVICE AND SEMICONDUCTOR DEVICE
20210382715 · 2021-12-09 · ·

An arithmetic processing device, includes a memory; and a processor coupled to the memory and the processor configured to: execute arithmetic processing which executes a plurality of instructions issued out of order, execute control processing which commits the plurality of instructions for which execution has been completed in order, identify, for each instruction included in the plurality of instructions, a count value which indicates a number of cycles from when execution of the instruction has been completed, and identify, among a plurality of uncommitted instructions, the instruction with the count value which matches the number of cycles in which an error is detected in the arithmetic processing as a specific instruction to be retried.

ALTERNATE PATH DECODE FOR HARD-TO-PREDICT BRANCH

An embodiment of an integrated circuit may comprise a core, a front end unit coupled to the core to decode one or more instruction wherein the front end unit includes a first decode path, a second decode path, and circuitry to: predict a taken branch of a conditional branch instruction of the one or more instructions, decode a predicted path of the taken branch on the first decode path, determine if the conditional branch instruction corresponds to a hard-to-predict conditional branch instruction and if the second decode path is available and, if so determined, decode an alternate path of a not-taken branch of the hard-to-predict conditional branch instruction on the second decode path. Other embodiments are disclosed and claimed.

HARDWARE MITIGATION FOR SPECTRE AND MELTDOWN-LIKE ATTACKS

Aspects include circuitry that includes a first global generation counter (GGC) that is increased upon decoding of a branch instruction and a second GGC that is increased upon a completion of the branch instruction. Upon a triggered rollback, the first GGC is reset. The circuitry also includes a generation tag memory associated with a register that receives loads during a side-channel attacks which is set to the first GGC upon a first load, and a determination unit to determine, for a second load from an address depending on the register of the first load, a generation tag value associated with the register of the second load as a function of the first GGC, the second GGC, and the generation tag value associated with the register of the first load. A wait queue is configured to block the second load, if the generation tag is larger than the second GGC.

METHOD AND APPARATUS FOR COMPARING PREDICTED LOAD VALUE WITH MASKED LOAD VALUE

A digital processor, method, and a non-transitory computer readable storage medium are described, and include a load pipeline operative to access a data content and convert the data content into a load result. The digital processor also includes a value prediction check circuit that is operative to access a speculative content, determine a predicted value from the speculative content, and determine a masked value by masking the data content with a data mask. The masked value is compared to the predicted value, and an action associated with the load result is commanded based upon the comparing of the masked value and the predicted value.

Slice-based allocation history buffer

A multi-slice processor comprising a high-level structure and history buffer. Write backs are no longer associated with the history buffer and the history buffer comprises slices determined by logical register allocation. The history buffer receives a register pointer entry and either releases or restores the entry with functional units comprised in the history buffer.

Content-addressable memory filtering based on microarchitectural state

Techniques are disclosed relating to filtering access to a content-addressable memory (CAM). In some embodiments, a processor monitors for certain microarchitectural states and filters access to the CAM in states where there cannot be a match in the CAM or where matching entries will not be used even if there is a match. In some embodiments, toggle control circuitry prevents toggling of input lines when filtering CAM access, which may reduce dynamic power consumption. In some example embodiments, the CAM is used to access a load queue to validate that out-of-order execution for a set of instructions matches in-order execution, and situations where ordering should be checked are relatively rare.

SYSTEMS, METHODS, AND APPARATUSES FOR HETEROGENEOUS COMPUTING

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.