Patent classifications
G06F9/3832
Coprocessor Prefetcher
A prefetcher for a coprocessor is disclosed. An apparatus includes a processor and a coprocessor that are configured to execute processor and coprocessor instructions, respectively. The processor and coprocessor instructions appear together in code sequences fetched by the processor, with the coprocessor instructions being provided to the coprocessor by the processor. The apparatus further includes a coprocessor prefetcher configured to monitor a code sequence fetched by the processor and, in response to identifying a presence of coprocessor instructions in the code sequence, capture the memory addresses, generated by the processor, of operand data for coprocessor instructions. The coprocessor is further configured to issue, for a cache memory accessible to the coprocessor, prefetches for data associated with the memory addresses prior to execution of the coprocessor instructions by the coprocessor.
Computer Architecture with Register Name Addressing and Dynamic Load Size Adjustment
A computer architecture allows load instructions to fetch from cache memory “fat” loads having more data than necessary to satisfy execution of the load instruction, for example, loading a full cache line instead of a required word. The fat load allows load instructions having spatiotemporal locality to share the data of the fat load avoiding cache accesses. Rapid access to local data structures is provided by using base register names to directly access those structures as a proxy for the actual load base register address,
RESCHEDULING A LOAD INSTRUCTION BASED ON PAST REPLAYS
Rescheduling a load instruction based on past replays is disclosed. A load replay predictor of a processor device determines, at a first time, that a load instruction is scheduled to be executed by a load store unit to load data from a memory location. The load replay predictor accesses load replay data associated with a previous replay of the load instruction and, based on the load replay data, causes the load instruction to be rescheduled.
Complex I/O Value Prediction for Multiple Values with Physical or Virtual Addresses
An apparatus, and corresponding method, for input/output (I/O) value determination, generates an I/O instruction for an I/O device, the I/O device including a state machine with state transition logic. The apparatus comprises a controller that includes a simplified state machine with a reduced version of the state transition logic of the state machine of the I/O device. The controller is configured to improve instruction execution performance of a processor core by employing the simplified state machine to predict at least one state value of at least one I/O device true state value to be affected by the I/O instruction at the I/O device.
APPARATUS AND METHOD FOR PIPELINE CONTROL
An apparatus and a method for pipeline control are provided. The apparatus includes a preload predictor, an arithmetic logic unit (ALU) and a data buffer. The preload predictor is configured to determine whether a load instruction conforms to at least one specific condition, to generate a preload determination result. The ALU is configured to perform arithmetic logic operations, and the data buffer is configured to provide data for being used by the ALU. When the preload determination result indicates that the load instruction conforms to the at least one specific condition, the data buffer fetches preload data from a cache memory according to information carried by the load instruction and stores the preload data in the data buffer, where the preload data is data requested by a subsequent load instruction.
METHODS FOR DYNAMIC INSTRUCTION SIMPLIFICATION BASED ON REGISTER VALUE LOCALITY
There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.
Reusing an operand received from a first-in-first-out (FIFO) buffer according to an operand specifier value specified in a predefined field of an instruction
Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.
DEVICE, METHOD AND SYSTEM TO PROVIDE A PREDICTED VALUE WITH A SEQUENCE OF MICRO-OPERATIONS
Techniques and mechanisms for efficiently making value prediction information available for use by in a processor. In an embodiment, the instruction execution is to include a loading of some data to a first location (e.g., a first register). A decoder of the processor accesses reference information which indicates that the execution is to comprise multiple micro-operations (μops) including a LoadCheck μop and a Move μop. The LoadCheck μop loads a first value to the first location, and checks whether the loaded first value is the same as a previously-determined second value which represents a prediction of what the first value would be. The Move μop moves the second value to the first location. In another embodiment, the Move μop is scheduled for execution out-of-order with respect to the LoadCheck μop, resulting in an early availability of the second value for access in a register file by another μop.
OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING SIMULTANEOUS TWO-TARGET LOADS AND STORES
Operation of a multi-slice processor that includes a plurality of execution slices and a load/store superslice, where the load/store superslice includes a set predict array, a first load/store slice, and a second load/store slice. Operation of such a multi-slice processor includes: receiving a two-target load instruction directed to the first load/store slice and a store instruction directed to the second load/store slice; determining a first subset of ports of the set predict array as inputs for an effective address for the two-target load instruction; determining a second subset of ports of the set predict array as inputs for an effective address for the store instruction; and generating, in dependence upon logic corresponding to the set predict array that is less than logic implementing an entire load/store slice, output for performing the two-target load instruction in parallel with generating output for performing the store instruction.
PREDICTING UPCOMING CONTROL FLOW
An apparatus has a fetch queue to identify a sequence of instructions to be fetched for execution and prediction circuitry to predict upcoming control flow and to control which instructions are identified in the fetch queue in dependence on the prediction. The prediction circuitry predicts multi-taken sequences which are sequences of instructions in which control flow is diverted by a first control flow changing instruction to a series of instructions terminating in a second control flow changing instruction that diverts control flow to a target address. The apparatus also has prediction confidence calculation circuitry to calculate confidence levels for respective multi-taken sequences. Each confidence level is indicative of a confidence in an accuracy of prediction of its respective multi-taken sequence. When the confidence level for a particular multi-taken sequence satisfies a prediction confidence condition, the prediction confidence tracking circuitry allows the particular multi-taken sequence to be predicted by the prediction circuitry. The prediction circuitry causes the series of instructions and the target instruction for the particular multi-taken sequence to be identified in the fetch queue when the prediction circuitry predicts the particular multi-taken sequence and further predictions to be made starting from the target address for the particular multi-taken sequence.