G06F9/3842

Managing load and store instructions for memory barrier handling

A front-end portion of a pipeline includes a stage that speculatively issues at least some instructions out-of-order. A back-end portion of the pipeline includes one or more stages that access a processor memory system. In the front-end (back-end), execution of instructions is managed based on information available in the front-end (back-end). Managing execution of a first memory barrier instruction includes preventing speculative out-of-order issuance of store instructions. The back-end control circuitry provides information accessible to the front-end control circuitry indicating that one or more particular memory instructions have completed handling by the processor memory system. The front-end control circuitry identifies one or more load instructions that were issued before the first memory barrier instruction was issued and are ordered after the first memory barrier instruction, and causes at least one of the identified load instructions to be reissued after the first memory barrier instruction has been issued.

USING METADATA PRESENCE INFORMATION TO DETERMINE WHEN TO ACCESS A HIGHER-LEVEL METADATA TABLE

Embodiments are provided for using metadata presence information to determine when to access a higher-level metadata table. It is determined that an incomplete hit occurred for a line of metadata in a lower-level structure of a processor, the lower-level structure being coupled to a higher-level structure in a hierarchy. It is determined that metadata presence information in a metadata presence table is a match to the line of metadata from the lower-level structure. Responsive to determining the match, it is determined to avoid accessing the higher-level structure of the processor.

ISSUE, EXECUTION, AND BACKEND DRIVEN FRONTEND TRANSLATION CONTROL FOR PERFORMANT AND SECURE DATA-SPACE GUIDED MICRO-SEQUENCING

Methods and apparatus relating to issue, execution, and backend driven frontend translation control for performant and secure data-space guided micro-sequencing are described. In an embodiment, Data-space Translation Logic (DTL) circuitry receives a static input and a dynamic input, and generates one or more outputs based at least in part on the static input and the dynamic input. The DTL circuitry generates the one or more outputs prior to commencement of speculation operations in a processor. Other embodiments are also disclosed and claimed.

Link stack based instruction prefetch augmentation

A computer-implemented method of performing a link stack based prefetch augmentation using a sequential prefetching includes observing a call instruction in a program being executed, and pushing a return address onto a link stack for processing the next instruction. A stream of instructions is prefetched starting from a cached line address of the next instruction and is stored in an instruction cache.

Thwarting Store-to-Load Forwarding Side Channel Attacks by Pre-Forwarding Matching of Physical Address Proxies and/or Permission Checking
20220358209 · 2022-11-10 ·

A method and system for mitigating against side channel attacks (SCA) that exploit speculative store-to-load forwarding is described. The method comprises ensuring that the physical load and store addresses match and/or that permissions are present before speculatively store-to-load forwarding. Various improvements maintain a short load-store pipeline, including usage of a virtual level-one data cache (DL1), usage of an inclusive physical level-two data cache (DL2), storage and lookup of physical data address equivalents in the DL1, and using a memory dependence predictor (MDP) to speed up or replace store queue camming of load data addresses against store data addresses.

Computer Architecture with Register Name Addressing and Dynamic Load Size Adjustment
20230089349 · 2023-03-23 ·

A computer architecture allows load instructions to fetch from cache memory “fat” loads having more data than necessary to satisfy execution of the load instruction, for example, loading a full cache line instead of a required word. The fat load allows load instructions having spatiotemporal locality to share the data of the fat load avoiding cache accesses. Rapid access to local data structures is provided by using base register names to directly access those structures as a proxy for the actual load base register address,

TECHNIQUES FOR PARALLEL EXECUTION
20220342673 · 2022-10-27 ·

Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.

RESCHEDULING A LOAD INSTRUCTION BASED ON PAST REPLAYS
20220342672 · 2022-10-27 ·

Rescheduling a load instruction based on past replays is disclosed. A load replay predictor of a processor device determines, at a first time, that a load instruction is scheduled to be executed by a load store unit to load data from a memory location. The load replay predictor accesses load replay data associated with a previous replay of the load instruction and, based on the load replay data, causes the load instruction to be rescheduled.

CHECKER AND CHECKING METHOD FOR PROSSOR CIRCUIT
20230078985 · 2023-03-16 ·

The present disclosure provides a checker and a checking method for a processor circuit. The checking method includes: determining whether a data cache send a data refill request under a branch prediction executing status for obtaining a first result; determining whether data requested by the data refill request is written into a register and calculated under the branch prediction executing status for obtaining a second result; and determining whether the processor circuit has a vulnerability according to the first result and the second result.

PROGRAM FLOW PREDICTION FOR LOOPS
20230130323 · 2023-04-27 ·

Instruction processing circuitry comprises fetch circuitry to fetch instructions for execution; instruction decoder circuitry to decode fetched instructions; execution circuitry to execute decoded instructions; and program flow prediction circuitry to predict a next instruction to be fetched; in which the instruction decoder circuitry is configured to decode a loop control instruction in respect of a given program loop and to derive information from the loop control instruction for use by the program flow prediction circuitry to predict program flow for one or more iterations of the given program loop.