Patent classifications
G06F9/3834
HAZARD AVOIDANCE IN A MULTI-SLICE PROCESSOR
Hazard avoidance in a multi-slice processor including adding, to a hazard table, an entry for an effective address, wherein the entry comprises an instruction tag (ITAG) offset for the effective address; fetching, by an instruction fetch unit, a processor instruction from a memory location using the effective address; determining that the hazard table includes the entry for the effective address; retrieving, from the hazard table, the ITAG offset for the effective address; identifying a prior internal operation (TOP) using the ITAG offset; and decoding the processor instruction into a load IOP with a dependency on the prior IOP.
Exception register delay
A processor includes: memory; an execution pipeline having a plurality of pipeline stages configured to process data provided to the execution pipeline and to store a result of the processing into the memory; a receive pipeline having a plurality of pipeline stages configured to handle incoming data to the processor and storing the incoming data into memory; context status storage configured to hold an exception indicator of an exception encountered by the execution pipeline while the execution pipeline processes data; wherein the receive pipeline is configured to determine that an exception has been committed to the context status storage by the execution pipeline, to suppress a write to memory of any incoming data to be handled by the receive pipeline and to commit a corresponding exception indicator to the context status storage at a final one of its pipeline stages.
HYBRID BLOCK-BASED PROCESSOR AND CUSTOM FUNCTION BLOCKS
Apparatus and methods are disclosed for implementing block-based processors having custom function blocks, including field-programmable gate array (FPGA) implementations. In some examples of the disclosed technology, a dynamically configurable scheduler is configured to issue at least one block-based processor instruction. A custom function block is configured to receive input operands for the instruction and generate ready state data indicating completion of a computation performed for the instruction by the respective custom function block.
IMPLEMENTING BARRIERS TO EFFICIENTLY SUPPORT CUMULATIVITY IN A WEAKLY-ORDERED MEMORY SYSTEM
A technique for operating a lower level cache memory of a data processing system includes receiving, by a store queue controller, an operation that is associated with a first thread. The store queue controller uses level one (L1) cache memory miss information for the operation to limit dependencies in a dependency data structure of a store queue of the lower level cache memory that are set and to remove dependencies that are otherwise unnecessary.
Methods, apparatus, instructions and logic to provide permute controls with leading zero count functionality
Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations.
Apparatus and method to preclude X86 special bus cycle load replays in an out-of-order processor
An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction comprises a load instruction resulting from execution of an x86 special bus cycle. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
Post-retire scheme for tracking tentative accesses during transactional execution
A method and apparatus for post-retire transaction access tracking is herein described. Load and store buffers are capable of storing senior entries. In the load buffer a first access is scheduled based on a load buffer entry. Tracking information associated with the load is stored in a filter field in the load buffer entry. Upon retirement, the load buffer entry is marked as a senior load entry. A scheduler schedules a post-retire access to update transaction tracking information, if the filter field does not represent that the tracking information has already been updated during a pendency of the transaction. Before evicting a line in a cache, the load buffer is snooped to ensure no load accessed the line to be evicted.
ARITHMETIC PROCESSING DEVICE, METHOD, AND SYSTEM
An arithmetic processing device includes: an instruction control circuit; primary cache circuit that includes a primary cache memory and a first buffer; and a secondary cache memory. The primary cache circuit is configured to, when a first instruction for executing processing to register data of a cache line in the secondary cache memory without the occurrence of an access to the main memory, is issued from the instruction control circuit and when data corresponding to a first address designated as an access target in the first instruction is not stored in the primary cache memory, store the first address in the first buffer and issue the first instruction to the secondary cache memory.
PREVENTING HAZARD FLUSHES IN AN INSTRUCTION SEQUENCING UNIT OF A MULTI-SLICE PROCESSOR
Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor including receiving a load instruction in a load reorder queue, wherein the load instruction is an instruction to load data from a memory location; subsequent to receiving the load instruction, receiving a store instruction in a store reorder queue, wherein the store instruction is an instruction to store data in the memory location; determining that the store instruction causes a hazard against the load instruction; preventing a flush of the load reorder queue based on a state of the load instruction; and re-executing the load instruction.
Saving/restoring selected registers in transactional processing
A TRANSACTION BEGIN instruction begins execution of a transaction and includes a general register save mask having bits, that when set, indicate registers to be saved in the event the transaction is aborted. At the beginning of the transaction, contents of the registers are saved in memory not accessible to the program, and if the transaction is aborted, the saved contents are copied to the registers.