Patent classifications
G06F9/3863
Slice-based allocation history buffer
A multi-slice processor comprising a high-level structure and history buffer. Write backs are no longer associated with the history buffer and the history buffer comprises slices determined by logical register allocation. The history buffer receives a register pointer entry and either releases or restores the entry with functional units comprised in the history buffer.
SYSTEMS, METHODS, AND APPARATUSES FOR HETEROGENEOUS COMPUTING
- Rajesh M. Sankaran ,
- Gilbert Neiger ,
- Narayan Ranganathan ,
- Stephen R. Van Doren ,
- Joseph Nuzman ,
- Niall D. McDonnell ,
- Michael A. O'Hanlon ,
- Lokpraveen B. Mosur ,
- Tracy Garrett Drysdale ,
- Eriko Nurvitadhi ,
- Asit K. Mishra ,
- Ganesh Venkatesh ,
- Deborah T. Marr ,
- Nicholas P. Carter ,
- Jonathan D. Pearce ,
- Edward T. Grochowski ,
- Richard J. Greco ,
- Robert Valentine ,
- Jesus Corbal ,
- Thomas D. Fletcher ,
- Dennis R. Bradford ,
- Dwight P. Manley ,
- Mark J. Charney ,
- Jeffrey J. Cook ,
- Paul Caprioli ,
- Koichi Yamada ,
- Kent D. Glossop ,
- David B. Sheffield
Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.
REDUCING SILENT DATA ERRORS USING A HARDWARE MICRO-LOCKSTEP TECHNIQUE
In one embodiment, an apparatus includes: an instruction fetch circuit to fetch instructions; a decode circuit coupled to the instruction fetch circuit to decode the fetched instructions into micro-operations (pops); a scheduler coupled to the decode circuit to schedule the pops for execution; and an execution circuit coupled to the scheduler, the execution circuit comprising a plurality of execution ports to execute the pops. The scheduler may be configured to: schedule at least some pops of a first type for redundant execution on symmetric execution ports of the plurality of execution ports; and schedule pops of a second type for non-redundant execution on a single execution port of the plurality of execution ports. Other embodiments are described and claimed.
Pluggable trust architecture
A pluggable trust architecture addresses the problem of establishing trust in hardware. The architecture has low impact on system performance and comprises a simple, user-supplied, and pluggable hardware element. The hardware element physically separates the untrusted components of a system from peripheral components that communicate with the external world. The invention only allows results of correct execution of software to be communicated externally.
Compute optimizations for low precision machine learning operations
Embodiments described herein provide a graphics processor that can perform a variety of mixed and multiple precision instructions and operations. One embodiment provides a streaming multiprocessor that can concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread (SIMT) architecture and the streaming multiprocessor is to execute multiple threads for each of multiple instructions. The streaming multiprocessor can perform concurrent integer and floating-point operations and includes a mixed precision core to perform operations at multiple precisions.
Coprocessor context priority
A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.
Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model
Replay of partially executed instruction blocks in a processor-based system employing a block-atomic execution model is disclosed. In one aspect, a partial replay controller is provided in a processor(s) of a central processing unit (CPU). If an instruction is detected in the instruction block associated with a potential architectural state modification, or an exception occurs during execution of instructions, the instruction block is re-executed. During re-execution of the instruction block, the partial replay controller is configured to record produced results from load/store instructions. Thus, if an exception occurs during re-execution of the instruction block, previously recorded produced results for the executed load/store instructions before the exception occurred are replayed during re-execution of the instruction block after the exception is resolved. Thus, execution of instructions leading up to side-effect operations in the instruction block can be deterministically repeated with previously produced results, without repeating the side-effects.
Arithmetic processing unit and control method for arithmetic processing unit
An arithmetic processing unit includes an instruction decoder, first to fourth reservation stations, first and second computing units, first and second load-store units, and an allocation unit. The allocation unit, when the execution instruction is a first instruction that is executable in first and second computing units but not executable in first and second load-store units, allocates the first instruction to first or second reservation station based on a first allocation table, and when the execution instruction is a second instruction that is executable in the first and second load-store units but not executable in the first and second computing units, allocates the second instruction to third or fourth reservation station based on a second allocation table.
METHODS AND SYSTEMS FOR UTILIZING A MASTER-SHADOW PHYSICAL REGISTER FILE
A processor in a data processing system includes a master-shadow physical register file and a renaming unit. The master-shadow physical register file has a master storage coupled to shadow storage. The renaming unit is coupled to the master-shadow physical register file. Based on an occurrence of shadow transfer activation conditions verified by the renaming unit, data in the master storage is transferred from the master storage to the shadow storage for storage. Data is transferred from the shadow storage back to the master storage based on the occurrence of a shadow-to-master transfer event, which includes, for example, a flush of the master storage by the processor.
Side channel attack prevention by maintaining architectural state consistency
The present disclosure is directed to systems and methods that maintain consistency between a system architectural state and a microarchitectural state in the system cache circuitry to prevent a side-channel attack from accessing secret information. Speculative execution of one or more instructions by the processor circuitry causes memory management circuitry to transition the cache circuitry from a first microarchitectural state to a second microarchitectural state. The memory management circuitry maintains the cache circuitry in the second microarchitectural state in response to a successful completion and/or retirement of the speculatively executed instruction. The memory management circuitry reverts the cache circuitry from the second microarchitectural state to the first microarchitectural state in response to an unsuccessful completion, flushing, and/or retirement of the speculatively executed instruction.