G06F9/3838

Fast perfect issue of dependent instructions in a distributed issue queue system

Embodiments for fast perfect issue of dependent instructions in a distributed issue queue system. Producer information of a producer instruction is inserted in a lookup entry in a lookup table, the lookup entry being allocated to a register. It is determined that the register corresponding to the lookup entry is a source for a dependent instruction. Responsive to storing the dependent instruction in an issue queue, the producer information is stored in a back-to-back entry of a back-to-back wakeup table, the back-to-back entry corresponding to the dependent instruction. The producer instruction is issued which causes the producer information of the producer instruction to be sent to the back-to-back wakeup table. It is determined that there is a match between the producer information and the back-to-back entry for the dependent instruction, and the dependent instruction is caused to issue based on the match.

Deterministic execution replay for multicore systems

Techniques are disclosed for interposing on nondeterministic events during multicore virtual machine (VM) execution to capture information that allows for deterministically recreating the nondeterministic events during execution replay of the VM. A method may include reading, by a virtual processor running within a multicore VM instance, an instruction to execute, and, responsive to a determination that the instruction is a nondeterministic instruction, interposing on the nondeterministic instruction execution so as to allow deterministic execution of the nondeterministic instruction during replay execution of the multicore VM instance. Interposing on the nondeterministic instruction execution may include recording a partial barrier event and/or a full barrier event. The nondeterministic instruction may be a read memory access instruction or a write memory access instruction.

Inferring dependencies, requirements, and productions from spreadsheet-based loosely-coupled decision tables
11526782 · 2022-12-13 · ·

A method includes receiving a spreadsheet file representing a plurality of decision tables, wherein the spreadsheet file does not indicate dependencies between non-labeled inputs and non-labeled outputs of the plurality of decision tables. The method further includes, for a first decision table of the plurality of decision tables, identifying, in view of an identifier of a second decision table of the plurality of decision tables, a dependent input that comprises an output of the second decision table of the plurality of decision tables. The method further includes determining, by a processing device, in view of an ordering of columns in the spreadsheet file, remaining inputs and outputs of the first decision table.

DECOUPLED ACCESS-EXECUTE PROCESSING
20220391214 · 2022-12-08 ·

An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.

MICROPROCESSOR AND METHOD FOR ISSUING LOAD/STORE INSTRUCTION
20220382547 · 2022-12-01 · ·

A microprocessor and a method for issuing a load/store instruction is introduced. The microprocessor includes a decode/issue unit, a load/store queue, a scoreboard, and a load/store unit. The scoreboard includes a plurality of scoreboard entries, in which each scoreboard entry includes an unknown bit value and a count value, wherein the unknown bit value or the count value is set when the instructions are issued. The decode/issue unit checks for WAR, WAW, and RAW data dependencies from the scoreboard, dispatches the load/store instructions to the load/store queue with the recorded scoreboard values. The load/store queue is configured to resolve the data dependencies and dispatches the load/store instructions to the load/store unit for execution.

APPARATUS AND METHOD FOR IMPLEMENTING VECTOR MASK IN VECTOR PROCESSING UNIT
20220382546 · 2022-12-01 · ·

The mask data corresponding to each data element of the issued instruction may be handled by a mask queue, where only the valid mask data are stored to the mask queue. The mask data of multiple vector instructions may be stored in the mask queue. The corresponding mask data may be accessed from the mask queue when the vector instruction(s) is dispatched from the execution queue to the functional unit for execution. In the case of 512-bit wide mask data is needed, the issuing of the vector instruction from the decode/issue unit to the execution queue may be stalled until the mask queue is available. In some embodiments, one mask queue may be dedicated to one execution queue. Alternatively, one mask queue may be shared between two different execution queues. In the disclosure, resources are conserved without dedicating additional storage space for handling mask data of the vector instruction.

DATA PROCESSING SYSTEMS

A data processing system is disclosed that includes one or more processors that can perform producer processes to produce work and consumer processes that can consume work produced by a producer process. The system includes a pool of plural communication resources that may be used for communications between a producer process and a consumer process. The system tracks the usage of communication resources of the pool of communication resources, and allocates a communication resource from the pool of communication resources based on the tracking.

Accelerated operation of a graph streaming processor

Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.

Processor, device, and method for executing instructions
11593115 · 2023-02-28 · ·

The present disclosure discloses an instruction execution device, a processor including the instruction execution device, a system on chip, and a method for executing a data storage instruction in the processor. The method includes: splitting the data storage instruction into a first split instruction and a second split instruction, wherein the first split instruction is associated with an address operand of the data storage instruction, and the second split instruction is associated with a data operand of the data storage instruction; executing the first split instruction to determine a data storage address corresponding to the address operand; executing the second split instruction to acquire data content corresponding to the data operand; and storing the acquired data content to the determined data storage address in a data storage region. The present disclosure further discloses a corresponding instruction execution device, a processor including the execution device and a system on chip.

PROCESSOR OVERRIDING OF A FALSE LOAD-HIT-STORE DETECTION
20230056077 · 2023-02-23 ·

A method for operation of a processor core is provided. A rejected first load instruction is received that has been rejected due to a false load-hit-store detection against a first store instruction. A warning label is generated on a basis of the false load-hit-store detection. The warning label is added to the received first load instruction to create a labeled first load instruction. The labeled first load instruction is issued such that the warning label causes the labeled first load instruction to bypass the first store instruction in the store reorder queue and thereby avoid another false load-hit-store detection against the first store instruction. A computer system and a processor core configured to operate according to the method are also disclosed herein.