G06F9/3861

SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD

Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.

METHOD OF NOTIFYING A PROCESS OR PROGRAMMABLE ATOMIC OPERATION TRAPS
20230004524 · 2023-01-05 ·

Disclosed in some examples, are methods, systems, programmable atomic units, and machine-readable mediums that provide an exception as a response to the calling processor. That is, the programmable atomic unit will send a response to the calling processor. The calling processor will recognize that the exception has been raised and will handle the exception. Because the calling processor knows which process triggered the exception, the calling processor (e.g., the Operating System) can take appropriate action, such as terminating the calling process. The calling processor may be a same processor as that executing the programmable atomic transaction, or a different processor (e.g., on a different chiplet).

TRACKING EXACT CONVERGENCE TO GUIDE THE RECOVERY PROCESS IN RESPONSE TO A MISPREDICTED BRANCH
20230004397 · 2023-01-05 ·

Processors and methods related to tracking exact convergence to guide the recovery process in response to a mispredicted branch are provided. An example processor includes a pipeline having a frontend and a backend. The processor further includes a state table for maintaining information related to at least a subset of branches corresponding to instructions being processed by the processor. The processor further includes state logic configured to access the state table and track locations of any exact convergence points associated with branches corresponding to the instructions being processed by the processor. The state logic is further configured to identify a first recovery method for recovering from a misprediction associated with a branch if a location of an exact convergence point associated with the branch is determined to be in the frontend of the pipeline, else identify a second recovery method for recovering from the misprediction associated with the branch.

Inferring future value for speculative branch resolution

Aspects of the invention include includes determining a first instruction in a processing pipeline, wherein the first instruction includes a compare instruction, determining a second instruction in the processing pipeline, wherein the second instruction includes a conditional branch instruction relying on the compare instruction, determining a predicted result of the compare instruction, and completing the conditional branch instruction using the predicted result prior to executing the compare instruction.

STATEFUL MICROCODE BRANCHING

Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.

Processing of data

A method and associated apparatus is disclosed for processing data by means of an error code, wherein the error code has an H-matrix with n columns and m rows, wherein the columns of the H-matrix are different, wherein component-by-component XOR sums of adjacent columns of the H-matrix are different from one another and from all columns of the H-matrix and wherein component-by-component XOR sums of nonadjacent columns of the H-matrix are different from all columns of the H-matrix and from all component-by-component XOR sums of adjacent columns of the H-matrix.

REMOTE FRONT-DROP FOR RECOVERY AFTER PIPELINE STALL

This disclosure describes techniques for performing a remote front-drop of data for recovery after a pipeline stall. The techniques include using a receiver-side dropping strategy that is driven from the sender-side. Components of a pipeline determine whether a pipeline is operating within specified latency constraints (e.g., experiencing a pipeline stall). Upon detecting a pipeline stall, the sending device is notified of the stall. Once the sending device is notified of the pipeline stall, the sending device can determine what action(s) to perform to address the pipeline stall. For example, the sending device may instruct one or more components of the pipeline to discard already sent data that has not been processed. This allows the older data to be dropped on the stalled pipeline while keeping the more recently sent data.

Queues for inter-pipeline data hazard avoidance

Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved. When a secondary instruction is output by the decoder for execution, the secondary instruction is stalled in a queue associated with the appropriate instruction pipeline if at least one counter associated with the primary instructions from which it depends indicates that there is a hazard related to the primary instruction.

Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Restoring speculative history used for making speculative predictions for instructions processed in a processor employing control independence techniques

Restoring speculative history used for making speculative predictions for instructions processed in a processor. The processor can be configured to speculatively predict an outcome of a condition or predicate of a conditional control instruction before its condition is fully evaluated in execution. Predictions are made by the processor based on a history that is updated based on outcomes of past predictions. If a conditional control instruction is mispredicted in execution, the processor can perform a misprediction recovery by stalling the instruction pipeline, flushing younger instructions in the instruction pipeline back to the mispredicted conditional control instruction, and then re-fetching instructions in the correct instruction flow path for execution. The processor can be configured to restore entries of the speculative history associated with younger control independent (CI) conditional control instructions, so that younger fetched instructions that follow non-re-fetched CI instructions in misprediction recovery will use a more accurate speculative history.