Patent classifications
G06F9/3863
Optimizing performance for context-dependent instructions
A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction.
TECHNOLOGIES FOR EFFICIENT LZ77-BASED DATA DECOMPRESSION
Technologies for data decompression include a computing device that reads a symbol tag byte from an input stream. The computing device determines whether the symbol can be decoded using a fast-path routine, and if not, executes a slow-path routine to decompress the symbol. The slow-path routine may include data-dependent branch instructions that may be unpredictable using branch prediction hardware. For the fast-path routine, the computing device determines a next symbol increment value, a literal increment value, a data length, and an offset based on the tag byte, without executing an unpredictable branch instruction. The computing device sets a source pointer to either literal data or reference data as a function of the tag byte, without executing an unpredictable branch instruction. The computing device may set the source pointer using a conditional move instruction. The computing device copies the data and processes remaining symbols. Other embodiments are described and claimed.
Computer processor employing byte-addressable dedicated memory for operand storage
A computer processor including a first memory structure that operates over multiple cycles to temporarily store operands referenced by at least one instruction. A plurality of functional units performs operations that produce and access operands stored in the first memory structure. A second memory structure is provided, separate from the first memory structure. The second memory structure is configured as a dedicated memory for storage of operands copied from the first memory structure. The second memory structure is organized with a byte-addressable memory space and each operand stored in the second memory structure is accessed by a given byte address into the byte-addressable memory space.
Distributed history buffer flush and restore handling in a parallel slice design
An approach is provided in which a computing system captures content included in a history buffer entry that corresponds to a flush ITAG. The computing system, in turn, uses an execution unit to transmit the content over a results bus to multiple registers and restore at least one of the registers accordingly.
SYSTEM-ON-CHIP FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING
An example system for speculative execution event counter checkpointing and restoring may include a plurality of symmetric cores, at least one of the symmetric cores to simultaneously process a plurality of threads and to perform out-of-order instruction processing for the plurality of threads; at least one shared cache circuit to be shared among two or more the of symmetric cores. The system may further include a memory controller to couple the symmetric cores to a system memory and a data communication interface to couple one or more of the cores to input/output devices. The system may further include event counter circuitry comprising: a plurality of event counters including programmable event counters and fixed event counters and one or more configuration registers to store configuration data to specify an event type to be counted by the programmable event counters, wherein at least one of the one or more configuration registers is to store configuration data for a plurality of the programmable event counters. The system may further include transactional memory circuitry to process transactional memory operations including load operations and store operations, the transactional memory circuitry to process a transaction begin instruction to indicate a start of a transactional execution region of a program, a transaction end instruction to indicate an end of the transactional execution region, and a transaction abort instruction to abort processing of the transactional execution region. The system may further include transaction checkpoint circuitry to store a processor state at the start of the transactional execution region of the program, the processor state including values of one or more of the event counters. The system may further include lock elision circuitry to cause critical sections of the program to execute as transactions on multiple threads without acquiring a lock, the lock elision circuitry to cause the critical sections to be re-executed non-speculatively using one or more locks in response to detecting a transaction failure.
TECHNIQUES FOR RESTORING PREVIOUS VALUES TO REGISTERS OF A PROCESSOR REGISTER FILE
A technique for operating a processor includes receiving, by a history buffer, a flush tag associated with an oldest instruction to be flushed from a processor pipeline. In response to the flush tag being older than a first instruction tag that identifies a first instruction associated with a current value stored in a register of the register file and younger than a second instruction tag that identifies a second instruction associated with a previous value that was stored in the register of the register file, the history buffer transfers the previous value for the register to the register file. In response to the flush tag not being older than the first instruction tag and younger than the second instruction tag, the history buffer does not transfer the previous value for the register to the register file (as such, the register maintains the current value following a pipeline flush).
MECHANISM FOR USING A RESERVATION STATION AS A SCRATCH REGISTER
A processor core includes an instruction-sequencing unit (ISU). The ISU includes a general register file (GRF) composed of multiple hardware general purpose registers (GPRs), an exception register (XER), and a reservation station (RS). The execution unit(s) load an address of data in a data GPR, and load a first portion of the data in a first data GPR and a second portion of the data in a second data GPR in the GRF, where loading the portions of the data generate intermediate data condition codes that are loaded in the XER. The execution unit(s) generate a cumulative data condition code, which is loaded into a history buffer within the ISU. The intermediate data condition codes are loaded into a reservation station (RS) within the ISU. Upon flushing the GRF and the XER, the ISU repopulates the GRF from a history buffer and the XER from the RS.
APPARATUS WITH SHARED TRANSACTIONAL PROCESSING RESOURCE, AND DATA PROCESSING METHOD
An apparatus (2) with multiple processing elements (4, 6, 8) has shared transactional processing resources (10, 50, 75) for supporting processing of transactions, which comprise operations performed speculatively following a transaction start event whose results are committed following a transaction end event. The transactional processing resources may have a significant overhead and sharing these between the processing elements helps reduce energy consumption and circuit area.
READ AND WRITE SETS FOR RANGES OF INSTRUCTIONS OF TRANSACTIONS
Transactional memory accesses are tracked using read and write sets based on actual program flow. A read and write set is associated with a range of instructions of a transaction. When execution follows a predicted branch, loads and stores are marked as being of selected read and write sets. Then, when a misprediction is processed, and execution is rewound, speculatively added read and write set indications are removed from the read and write sets.
System and method for validating program execution at run-time
A pipelined processor comprising a cache memory system, fetching instructions for execution from a portion of said cache memory system, an instruction commencing processing before a digital signature of the cache line that contained the instruction is verified against a reference signature of the cache line, the verification being done at the point of decoding, dispatching, or committing execution of the instruction, the reference signature being stored in an encrypted form in the processor's memory, and the key for decrypting the said reference signature being stored in a secure storage location. The instruction processing proceeds when the two signatures exactly match and, where further instruction processing is suspended or processing modified on a mismatch of the two said signatures.