G06F9/3863

Determining a restart point in out-of-order execution

There is provided a data processing apparatus comprising decode circuitry responsive to receipt of a block of instructions to generate control signals indicative of each of the block of instructions, and to analyse the block of instructions to detect a potential hazard instruction. The data processing apparatus is provided with decode circuitry to encode information indicative of a clean restart point into the control signals associated with the potential hazard instruction. The data processing apparatus is provided with data processing circuity to perform out-of-order execution of at least some of the block of instructions, and control circuitry responsive to a determination, at execution of the potential hazard instruction, that data values used as operands for the potential hazard instruction have been modified by out-of-order execution of a subsequent instruction, to restart execution from the clean restart point and to flush held data values from the data processing circuitry.

System and Method for Instruction Unwinding in an Out-of-Order Processor
20230153113 · 2023-05-18 ·

A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.

SHADOW CACHE FOR SECURING CONDITIONAL SPECULATIVE INSTRUCTION EXECUTION
20220391212 · 2022-12-08 ·

A computing device, having: a processor; memory; a first cache coupled between the memory and the processor; and a second cache coupled between the memory and the processor. During speculative execution of one or more instructions, effects of the speculative execution are contained within the second cache.

PROGRAMMABLE EVENT TESTING
20230022869 · 2023-01-26 ·

A method includes executing software code comprising a plurality of execute packets; responsive to an execute packet of the software code being executed by a data processor core, advancing a value of a test counter register; and responsive to the value of the test counter register being equal to a terminal value, triggering an event to be handled by the software code.

SYSTEMS AND METHODS FOR POLICY EXECUTION PROCESSING

A system and method of processing instructions may comprise an application processing domain (APD) and a metadata processing domain (MTD). The APD may comprise an application processor executing instructions and providing related information to the MTD. The MTD may comprise a tag processing unit (TPU) having a cache of policy-based rules enforced by the MTD. The TPU may determine, based on policies being enforced and metadata tags and operands associated with the instructions, that the instructions are allowed to execute (i.e., are valid). The TPU may write, if the instructions are valid, the metadata tags to a queue. The queue may (i) receive operation output information from the application processing domain, (ii) receive, from the TPU, the metadata tags, (iii) output, responsive to receiving the metadata tags, resulting information indicative of the operation output information and the metadata tags; and (iv) permit the resulting information to be written to memory.

COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS

One embodiment provides a multi-chip module accelerator usable to execute tensor data processing operations a multi-chip module. The multi-chip module may include a memory stack including multiple memory dies and parallel processor circuitry communicatively coupled to the memory stack. The parallel processor circuitry may include multiprocessor cores to execute matrix multiplication and accumulate operations. The matrix multiplication and accumulate operations may include floating-point operations that are configurable to include two-dimensional matrix multiply and accumulate operations involving inputs that have differing floating-point precisions. The floating-point operations may include a first operation at a first precision and a second operation at a second precision. The first operation may include a multiply having at least one 16-bit floating-point input and the second operation may include an accumulate having a 32-bit floating-point input.

Evicting and restoring information using a single port of a logical register mapper and history buffer in a microprocessor comprising multiple main register file entries mapped to one accumulator register file entry

A computer system, processor, programming instructions and/or method of processing data that includes a main register file having a plurality of entries for storing data; an accumulator register file having a plurality of entries for storing data wherein multiple main register file entries are mapped to one accumulator register file entry in the at least one accumulator register file; a logical register mapper to track and map logical registers to main register file entries, and a history buffer. Processing wide data width instructions includes evicting and restoring information from a single primary entry in the logical register mapper through a single read or write port in the logical register mapper without evicting or restoring the remaining other multiple main register file entries mapped in the accumulator register.

Compute optimizations for low precision machine learning operations

Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple or mixed precisions and dynamic ranges.

POWER EFFICIENT MULTI-BIT STORAGE SYSTEM

Disclosed herein are embodiments related to a power efficient multi-bit storage system. In one configuration, the multi-bit storage system includes a first storage circuit, a second storage circuit, a prediction circuit, and a clock gating circuit. In one aspect, the first storage circuit updates a first output bit according to a first input bit, in response to a trigger signal, and the second storage circuit updates a second output bit according to a second input bit, in response to the trigger signal. In one aspect, the prediction circuit generates a trigger enable signal indicating whether at least one of the first output bit or the second output bit is predicted to change a state. In one aspect, the clock gating circuit generates the trigger signal based on the trigger enable signal.

Shadow cache for securing conditional speculative instruction execution
11422820 · 2022-08-23 · ·

A computing device, having: a processor; memory; a first cache coupled between the memory and the processor; and a second cache coupled between the memory and the processor. During speculative execution of one or more instructions, effects of the speculative execution are contained within the second cache.