G06F9/3856

PARTITION AND ISOLATION OF A PROCESSING-IN-MEMORY (PIM) DEVICE

An apparatus that manages multi-process execution in a processing-in-memory (“PIM”) device includes a gatekeeper configured to: receive an identification of one or more registered PIM processes; receive, from a process, a memory request that includes a PIM command; if the requesting process is a registered PIM process and another registered PIM process is active on the PIM device, perform a context switch of PIM state between the registered PIM processes; and issue the PIM command of the requesting process to the PIM device.

Multi-Cycle Scheduler with Speculative Picking of Micro-Operations
20230195517 · 2023-06-22 ·

A multi-cycle scheduler for a processor includes early wake circuitry, late wake circuitry, and picker circuitry. In a first cycle of a clock, the early wake circuitry speculatively identifies child micro-operations as ready whose dependencies are satisfied by a set of ready parent micro-operations. In a second cycle of the clock, the picker circuitry picks at least one of the child micro-operations identified as ready for issue to execution circuitry. In addition, the late wake circuitry blocks from issue at least one picked child micro-operation speculatively identified as ready upon determining that a respective parent micro-operation did not issue to execution circuitry.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Scheduling resuming of ready to run virtual processors in a distributed system

Dynamic scheduling is disclosed. A plurality of physical nodes is included in a computer system. Each node includes a plurality of processors. Each processor includes a plurality of hyperthreads. In response to receiving an indication of an event occurring, a search is performed for a queue in a set of queues on which to place a virtual processor that had been waiting on the event. Queues in the set of queues correspond to hyperthreads in a physical node in the plurality of physical nodes. The queues in the set of queues are visited according to a predetermined traversal order.

Instruction and Logic for Total Store Elimination
20170351516 · 2017-12-07 ·

A processor includes a front end including circuitry to decode instructions from an instruction stream, a data cache unit including circuitry to cache data for the processor, and a binary translator. The binary translator includes circuitry to identify a redundant store in the instruction stream, mark the start and end of a region of the instruction stream with the redundant store, remove the redundant store, and store an amended instruction stream with the redundant store removed.

OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING DATAPATH STEERING

Operation of a multi-slice processor implementing datapath steering, where the multi-slice processor includes a plurality of execution slices. Operation of such a multi-slice processor includes: identifying, from a set of instructions, a second instruction that is dependent upon a first instruction in the set of instructions; and responsive to the second instruction being dependent upon the first instruction in the set of instructions, issuing each of the instructions in the set of instructions to a particular set of execution slices configured with bypass logic between execution slices that reduces execution latencies between dependent instructions.

ECC SCRUBBING IN A MULTI-SLICE MICROPROCESSOR

Techniques for error correction in a processor include detecting an error in first data stored in a register. The method also includes generating an instruction to read the first data stored in the register, where the register is both a source register and a destination register of the instruction. The method further includes transmitting the first data and error correcting code data to an execution unit, where the first data and error correcting code data bypasses an issue queue. The method also includes decoding the instruction and correcting the error to generate corrected data and writing the corrected data to the destination register.

APPARATUS AND METHOD WITH PREDICTION FOR LOAD OPERATION
20230185573 · 2023-06-15 ·

An apparatus has processing circuitry, load tracking circuitry and load prediction circuitry. It is determined whether tracking information indicates that there is a risk of target data, corresponding to an address of a speculatively-issued load operation which is speculatively issued (bypassing an older operation) based on a prediction determined by the load prediction circuitry, having changed between the target data being loaded for the speculatively-issued load operation and data being loaded for a given older load operation bypassed by the speculatively-issued load operation. If so, independent of whether the addresses of the speculatively-issued load operation and the given older load operation correspond, at least the speculatively-issued load operation is reissued, even when the prediction is correct. This protects against ordering violations.

METHOD AND APPARATUS FOR REORDERING IN A NON-UNIFORM COMPUTE DEVICE
20170344367 · 2017-11-30 · ·

A data processing apparatus includes a multi-level memory system, one or more first processing unit coupled to the memory system at a first level and one or more second processing units each coupled to the memory system at a second level. A first reorder buffer maintains data order during execution of instructions by the first and second processing units and a second reorder buffer maintains data order during execution of the instructions by an associated second processing unit. An entry in the first reorder buffer is configured, dependent upon an indicator bit, as an entry for a single instruction or a pointer to an entry in the second reorder buffer. An entry in the second reorder buffer includes instruction block start and end addresses and indicators of input and output register. Instructions are released to a processing unit when all inputs, as indicated by the reorder buffers, are available.

PROCESSOR WITH EFFICIENT REORDER BUFFER (ROB) MANAGEMENT
20170344374 · 2017-11-30 ·

A method includes, in a pipeline of a processor, writing instructions of a single software thread that are pending for execution into a reorder buffer (ROB) in accordance with a single write position, and incrementing the single write position to point to a location in the ROB for a next instruction to be written. The instructions, which were written in accordance with the single write position, are removed from first and second different locations in the ROB, and the first and second locations are incremented.