Patent classifications
G06F9/3863
Method and apparatus for renaming source operands of instructions
A renaming unit configured to rename source operands of instructions in a group. A renaming register maintains architectural to physical register mappings. Architectural to physical register mappings propagate from the renaming register through a chain of update units (U) over bus lines denoted with the architectural registers 0 to L. Update units (U) sequentially, in program order, insert physical register identifiers PR(i) allocated to instructions I(i) with destination operands DOP(i) on bus lines denoted with the destination operands DOP(i). Source operands of an instruction I(i) may be renamed to physical register identifiers after physical register identifiers allocated to instructions older than I(i) are sequentially, in program order, inserted on the bus lines, but before physical register identifiers allocated to I(i) and younger instructions are inserted on the bus lines. A source operand SOP(i) is renamed to a physical register identifier that propagates on a bus line denoted with SOP(i).
EVICTING AND RESTORING INFORMATION USING A SINGLE PORT OF A LOGICAL REGISTER MAPPER AND HISTORY BUFFER IN A MICROPROCESSOR COMPRISING MULTIPLE MAIN REGISTER FILE ENTRIES MAPPED TO ONE ACCUMULATOR REGISTER FILE ENTRY
A computer system, processor, programming instructions and/or method of processing data that includes a main register file having a plurality of entries for storing data; an accumulator register file having a plurality of entries for storing data wherein multiple main register file entries are mapped to one accumulator register file entry in the at least one accumulator register file; a logical register mapper to track and map logical registers to main register file entries, and a history buffer. Processing wide data width instructions includes evicting and restoring information from a single primary entry in the logical register mapper through a single read or write port in the logical register mapper without evicting or restoring the remaining other multiple main register file entries mapped in the accumulator register.
System and method for instruction unwinding in an out-of-order processor
A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.
User mode event handling
A method includes asserting a field of an event flag mask register configured to inhibit an event handler. The method also includes, responsive to an event that corresponds to the field of the event flag mask register being triggered: asserting a field of an event flag register associated with the event; and based the field in the event flag register being asserted, taking an action by a task being executed by the data processor core.
TECHNIQUES FOR COMBINING OPERATIONS
Apparatuses, systems, and techniques to combine operations. In at least one embodiment, a processor causes two or more dependent reduction operations to be combined into a software kernel.
TECHNIQUES FOR RECOVERING FROM ERRORS WHEN EXECUTING SOFTWARE APPLICATIONS ON PARALLEL PROCESSORS
In various embodiments, a software program uses hardware features of a parallel processor to checkpoint a context associated with an execution of a software application on the parallel processor. The software program uses a preemption feature of the parallel processor to cause the parallel processor to stop executing instructions in accordance with the context. The software program then causes the parallel processor to collect state data associated with the context. After generating a checkpoint based on the state data, the software program causes the parallel processor to resume executing instructions in accordance with the context.
Methods and systems for utilizing a master-shadow physical register file based on verified activation
A processor in a data processing system includes a master-shadow physical register file and a renaming unit. The master-shadow physical register file has a master storage coupled to shadow storage. The renaming unit is coupled to the master-shadow physical register file. Based on an occurrence of shadow transfer activation conditions verified by the renaming unit, data in the master storage is transferred from the master storage to the shadow storage for storage. Data is transferred from the shadow storage back to the master storage based on the occurrence of a shadow-to-master transfer event, which includes, for example, a flush of the master storage by the processor.
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS
One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
Method and system for instruction block to execution unit grouping
A method for emulating a guest centralized flag architecture by using a native distributed flag architecture. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and using a distributed flag architecture to emulate a centralized flag architecture for the emulation of guest instruction execution.
Unified register file for supporting speculative architectural states
A method for supporting architecture speculation in an out of order processor is disclosed. The method comprises fetching two threads into the processor, wherein a first thread executes in a speculative state and a second thread executes in a non-speculative state. The method also comprises enabling a speculative scope for an execution of the first thread and a non-speculative scope for an execution of the second thread in an architecture of the processor, wherein the speculative scope and the non-speculative scope can both be fetched into the architecture and be present concurrently.