G06F9/3863

COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS

Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple or mixed precisions and dynamic ranges.

Systems, methods, and apparatuses for heterogeneous computing

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

Steering a history buffer entry to a specific recovery port during speculative flush recovery lookup in a processor

A computer system, processor, and method for processing information is disclosed that includes reading out a plurality of entries in a history buffer prior to initiating a flush recovery process; initiating the flush recovery process; determining which of the history buffer entries read out of the history buffer should be recovered; and sending information associated with the history buffer entries to be recovered to one or more history buffer recovery ports. In one or more embodiments, the history buffer entries are continually read out in response to a processor and history buffer entries read out from the history buffer are directed to a specific history buffer recovery port associated with a mapper of a specific logical register.

Storing a result of a first instruction of an execute packet in a holding register prior to completion of a second instruction of the execute packet

A method includes receiving an execute packet that includes a first instruction and a second instruction and executing the first instruction and the second instruction using a pipeline. Executing the first and second instructions includes storing a result of the first instruction in a holding register; determining whether an event that interrupts execution of the execute packet occurs prior to completion of the executing of the second instruction; and based on the event not occurring, committing the result of the first instruction after completion of the executing of the second instruction.

Programmable event testing

A method includes executing software code comprising a plurality of execute packets; responsive to an execute packet of the software code being executed by a data processor core, advancing a value of a test counter register; and responsive to the value of the test counter register being equal to a terminal value, triggering an event to be handled by the software code.

Power efficient multi-bit storage system

Disclosed herein are embodiments related to a power efficient multi-bit storage system. In one configuration, the multi-bit storage system includes a first storage circuit, a second storage circuit, a prediction circuit, and a clock gating circuit. In one aspect, the first storage circuit updates a first output bit according to a first input bit, in response to a trigger signal, and the second storage circuit updates a second output bit according to a second input bit, in response to the trigger signal. In one aspect, the prediction circuit generates a trigger enable signal indicating whether at least one of the first output bit or the second output bit is predicted to change a state. In one aspect, the clock gating circuit generates the trigger signal based on the trigger enable signal.

RESTORING SPECULATIVE HISTORY USED FOR MAKING SPECULATIVE PREDICTIONS FOR INSTRUCTIONS PROCESSED IN A PROCESSOR EMPLOYING CONTROL INDEPENDENCE TECHNIQUES

Restoring speculative history used for making speculative predictions for instructions processed in a processor. The processor can be configured to speculatively predict an outcome of a condition or predicate of a conditional control instruction before its condition is fully evaluated in execution. Predictions are made by the processor based on a history that is updated based on outcomes of past predictions. If a conditional control instruction is mispredicted in execution, the processor can perform a misprediction recovery by stalling the instruction pipeline, flushing younger instructions in the instruction pipeline back to the mispredicted conditional control instruction, and then re-fetching instructions in the correct instruction flow path for execution. The processor can be configured to restore entries of the speculative history associated with younger control independent (CI) conditional control instructions, so that younger fetched instructions that follow non-re-fetched CI instructions in misprediction recovery will use a more accurate speculative history.

Apparatus and method for controlling allocation of information into a cache storage
11294828 · 2022-04-05 · ·

An apparatus and method are provided for controlling allocation of information into a cache storage. The apparatus has processing circuitry for executing instructions, and for allowing speculative execution of one or more of those instructions. A cache storage is also provided having a plurality of entries to store information for reference by the processing circuitry, and cache control circuitry is used to control the cache storage, the cache control circuitry comprising a speculative allocation tracker having a plurality of tracking entries. The cache control circuitry is responsive to a speculative request associated with the speculative execution, requiring identified information to be allocated into a given entry of the cache storage, to allocate a tracking entry in the speculative allocation tracker for the speculative request before allowing the identified information to be allocated into the given entry of the cache storage. The allocated tracking entry is employed to maintain restore information sufficient to enable the given entry to be restored to an original state that existed prior to the identified information being allocated into the given entry. The cache control circuitry is further responsive to a mis-speculation condition being detected in respect of the speculative request, to employ the restore information maintained in the allocated tracking entry for that speculative request in order to restore the given entry in the cache storage to the original state. Such an approach can provide robust protection against speculation-based cache timing side-channel attacks whilst alleviating the performance and/or power consumption issues associated with known techniques.

Instruction streaming using state migration

A method, system, and/or processor for processing data is disclosed that includes processing a parent stream, detecting a branch instruction in the parent stream, activating an additional child stream, copying the content of a parent mapper copy of the parent stream to an additional child mapper copy, dispatching instructions for the parent stream and the additional child stream, and executing the parent stream and the additional child stream on different execution slices. In an aspect, a first parent mapper copy is associated and used in connection with executing the parent stream and a second different child mapper copy is associated and used in connection with executing the additional child stream. The method in an aspect includes processing one or more streams and/or one or more threads of execution on one or more execution slices.

Coprocessor Context Priority

A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.