G06F9/3834

DECOUPLED ACCESS-EXECUTE PROCESSING
20220391214 · 2022-12-08 ·

An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.

MICROPROCESSOR AND METHOD FOR ISSUING LOAD/STORE INSTRUCTION
20220382547 · 2022-12-01 · ·

A microprocessor and a method for issuing a load/store instruction is introduced. The microprocessor includes a decode/issue unit, a load/store queue, a scoreboard, and a load/store unit. The scoreboard includes a plurality of scoreboard entries, in which each scoreboard entry includes an unknown bit value and a count value, wherein the unknown bit value or the count value is set when the instructions are issued. The decode/issue unit checks for WAR, WAW, and RAW data dependencies from the scoreboard, dispatches the load/store instructions to the load/store queue with the recorded scoreboard values. The load/store queue is configured to resolve the data dependencies and dispatches the load/store instructions to the load/store unit for execution.

Implementation of load acquire/store release instructions using load/store operation with DMB operation

A system and method are provided for simplifying load acquire and store release semantics that are used in reduced instruction set computing (RISC). Translating the semantics into micro-operations, or low-level instructions used to implement complex machine instructions, can avoid having to implement complicated new memory operations. Using one or more data memory barrier operations in conjunction with load and store operations can provide sufficient ordering as a data memory barrier ensures that prior instructions are performed and completed before subsequent instructions are executed.

APPARATUS AND METHOD FOR IDENTIFYING AND PRIORITIZING CERTAIN INSTRUCTIONS IN A MICROPROCESSOR INSTRUCTION PIPELINE
20220374237 · 2022-11-24 ·

A microprocessor improves Memory Level Parallelism (MLP) with minimal added complexity and without requiring segregated storage or management of instructions, by marking memory instructions and related instructions as urgent, and dispatching marked and unmarked instructions into common queuing circuitry for scheduled execution within scheduling circuitry that is configured to prioritize the execution of marked instructions. Instruction marking may be limited to the span of the renaming stage or may be extended to the span of the reorder buffer for additional gains in MLP.

ON THE FLY CONFIGURATION OF A PROCESSING CIRCUIT
20220374246 · 2022-11-24 ·

A method for on-the fly updating of a processing circuit, the method includes monitoring, by multiple coroutines and during a monitoring period, a progress of multiple suspend-update-resume sequences executed by the processing circuit, wherein at least some of the multiple execute and suspend-update-resume sequences partially overlap and are not mutually synchronized, and wherein each suspend-update-resume sequence comprises on-the-fly updates; and determining, by a merged coroutine, timings of the multiple suspend-update-resume sequences, wherein the determining comprises performing multiple calculation iterations, wherein a calculation iteration of the multiple calculation iterations comprises calculating, in a an iterative manner, a timing of a next suspend-update-resume sequence to be executed out of the multiple suspend-update-resume sequences, and wherein the calculating is responsive to timing offsets between different suspend-update-resume sequences.

Managing load and store instructions for memory barrier handling

A front-end portion of a pipeline includes a stage that speculatively issues at least some instructions out-of-order. A back-end portion of the pipeline includes one or more stages that access a processor memory system. In the front-end (back-end), execution of instructions is managed based on information available in the front-end (back-end). Managing execution of a first memory barrier instruction includes preventing speculative out-of-order issuance of store instructions. The back-end control circuitry provides information accessible to the front-end control circuitry indicating that one or more particular memory instructions have completed handling by the processor memory system. The front-end control circuitry identifies one or more load instructions that were issued before the first memory barrier instruction was issued and are ordered after the first memory barrier instruction, and causes at least one of the identified load instructions to be reissued after the first memory barrier instruction has been issued.

PROCESSOR OVERRIDING OF A FALSE LOAD-HIT-STORE DETECTION
20230056077 · 2023-02-23 ·

A method for operation of a processor core is provided. A rejected first load instruction is received that has been rejected due to a false load-hit-store detection against a first store instruction. A warning label is generated on a basis of the false load-hit-store detection. The warning label is added to the received first load instruction to create a labeled first load instruction. The labeled first load instruction is issued such that the warning label causes the labeled first load instruction to bypass the first store instruction in the store reorder queue and thereby avoid another false load-hit-store detection against the first store instruction. A computer system and a processor core configured to operate according to the method are also disclosed herein.

Arithmetic processing apparatus and control method using ordering property
11500639 · 2022-11-15 · ·

An arithmetic processing apparatus includes a memory, a first processor coupled to the memory, and a second processor coupled to the memory. The first processor is configured to consecutively issue a plurality of load instructions for reading respective data with respect to the memory. The first processor is configured to determine whether an ordering property is guaranteed, based on values included in the data loaded from the memory. The second processor is configured to issue a store instruction during an execution of the plurality of load instructions with respect to the memory.

PHYSICAL ADDRESS PROXY REUSE MANAGEMENT

Each load/store queue entry holds a load/store physical address proxy (PAP) for use as a proxy for a load/store physical memory line address (PMLA). The load/store PAP comprises a set index and a way that uniquely identifies an L2 cache entry holding a memory line at the load/store PMLA when an L1 cache provides the load/store PAP during the load/store instruction execution. The microprocessor removes a line at a removal PMLA from an L2 entry, forms a removal PAP as a proxy for the removal PMLA that comprises a set index and a way, snoops the load/store queue with the removal PAP to determine whether the removal PAP is being used as a proxy for the removal PMLA, fills the removed entry with a line at a fill PMLA, and prevents the removal PAP from being used as a proxy for the removal PMLA and the fill PMLA concurrently.

UNFORWARDABLE LOAD INSTRUCTION RE-EXECUTION ELIGIBILITY BASED ON CACHE UPDATE BY IDENTIFIED STORE INSTRUCTION
20220358040 · 2022-11-10 ·

A microprocessor includes a cache memory, a store queue, and a load/store unit. Each entry of the store queue holds store data associated with a store instruction. The load/store unit, during execution of a load instruction, makes a determination that an entry of the store queue holds store data that includes some but not all bytes of load data requested by the load instruction, cancels execution of the load instruction in response to the determination, and writes to an entry of a structure from which the load instruction is subsequently issuable for re-execution an identifier of a store instruction that is older in program order than the load instruction and an indication that the load instruction is not eligible to re-execute until the identified older store instruction updates the cache memory with store data.