Patent classifications
G06F9/3802
Inserting a proxy read instruction in an instruction pipeline in a processor
Inserting a proxy read instruction in an instruction pipeline in a processor is disclosed. A scheduler circuit is configured to recognize when a produced value generated by execution of a producer instruction in the instruction pipeline will not be available through a data forwarding path to be consumed for processing of a subsequent consumer instruction. In this case, the scheduling circuit is configured to insert a proxy read instruction in the instruction pipeline to cause execution of an operation to generate the same produced value as was generated by previous execution of producer instruction in the instruction pipeline. Thus, the produced value will remain available in the instruction pipeline to again be available through a data forwarding path to an earlier stage of the instruction pipeline to be consumed by a consumer instruction, which may avoid a pipeline stall.
METHODS AND APPARATUS FOR INSTRUCTION STORAGE
Aspects of the present disclosure relate an apparatus comprising fetch circuitry and instruction storage circuitry. The fetch circuitry is to fetch instructions for execution by execution circuitry. The instruction storage circuitry is to store temporary copies of fetched instructions. The fetch circuitry is configured to preferentially fetch instructions from the instruction storage circuitry. The instruction storage circuitry is configured to, responsive to a storage condition being met, begin storing copies of consecutive fetched instructions, the storage condition indicating a utility of a current fetched instruction; and to, responsive to determining that a number of said stored consecutive instructions has reached a storage threshold, cease storing copies of subsequent fetched instructions.
RESCHEDULING A LOAD INSTRUCTION BASED ON PAST REPLAYS
Rescheduling a load instruction based on past replays is disclosed. A load replay predictor of a processor device determines, at a first time, that a load instruction is scheduled to be executed by a load store unit to load data from a memory location. The load replay predictor accesses load replay data associated with a previous replay of the load instruction and, based on the load replay data, causes the load instruction to be rescheduled.
METHOD AND APPARATUS FOR IMPLIED BIT HANDLING IN FLOATING POINT MULTIPLICATION
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
SERVICING CPU DEMAND REQUESTS WITH INFLIGHT PREFETCHES
Disclosed embodiments provide a technique in which a memory controller determines whether a fetch address is a miss in an L1 cache and, when a miss occurs, allocates a way of the L1 cache, determines whether the allocated way matches a scoreboard entry of pending service requests, and, when such a match is found, determine whether a request address of the matching scoreboard entry matches the fetch address. When the matching scoreboard entry also has a request address matching the fetch address, the scoreboard entry is modified to a demand request.
SYSTEM WITH DYNAMICALLY SELECTABLE FIRMWARE IMAGE SEQUENCING FOR PRODUCTION TEST, DEBUG, PROTOTYPING
A system has a memory programmed with multiple firmware images each having an associated distinct entry point, a processor, a writable hardware register, and a controller external to the processor that, prior to each reset of a sequence of resets of the processor, reads the entry point of a firmware image from the hardware register and causes the processor to begin fetching instructions at the entry point read from the hardware register. The firmware images include boot, mission mode, and at least one other firmware image. The memory may be writeable with a modifiable version of a post-production mission mode, debug, prototype, or patched ROM firmware image. A second controller writes a second entry point to the hardware register prior to an initial reset such that the external controller reads the second entry point and causes fetching instructions at the second entry point rather than the initial entry point.
Reducing operations of sum-of-multiply-accumulate (SOMAC) instructions
Methods, systems and apparatuses for reducing operations of Sum-Of-Multiply-Accumulate (SOMAC) instructions are disclosed. One method includes scheduling, by a scheduler, a thread for execution, executing, by a processor of a plurality of processors, the thread, fetching, by the processor, a plurality of instructions for the thread from a memory, selecting, by a thread arbiter of the processor, an instruction of the plurality of instructions for execution in an arithmetic logic unit (ALU) pipeline of the processor, and reading the instruction, and determining, by a macro-instruction iterator of the processor, whether the instruction is a Sum-Of-Multiply-Accumulate (SOMAC) instruction with an instruction size, wherein the instruction size indicates a number of iterations that the SOMAC instruction is to be executed.
Store prefetches for dependent loads in a processor
An information handling system, method, and processor that detects a store instruction for data in a processor where the store instruction is a reliable indicator of a future load for the data; in response to detecting the store instruction, sends a prefetch request to memory for an entire cache line containing the data referenced in the store instruction, and preferably only the single cache line containing the data; and receives, in response to the prefetch request, the entire cache line containing the data referenced in the store instruction.
MULTI-PROCESSOR SYSTEM WITH DYNAMICALLY SELECTABLE MULTI-STAGE FIRMWARE IMAGE SEQUENCING AND DISTRIBUTED PROCESSING SYSTEM THEREOF
A distributed processing system with multiple systems connected by an inter-system communication interface. Each system has a memory programmed with multiple firmware images each having a distinct entry point, a processor, a writable (by another system of the distributed processing system) hardware register initially seeded with an initial firmware image entry point, and a controller external to the processor that, prior to an initial reset, reads the entry point from the hardware register and causes the processor to begin fetching instructions at the initial entry point. Prior to a subsequent reset of the processor, the external controller facilitates a transition to another firmware image by reading its entry point from the hardware register and causing the processor to begin fetching instructions at the other entry point. Each system may have multiple processors and multiple associated hardware registers writeable by another processor of the system or a by host processor.
Zero latency prefetching in caches
This invention involves a cache system in a digital data processing apparatus including: a central processing unit core; a level one instruction cache; and a level two cache. The cache lines in the second level cache are twice the size of the cache lines in the first level instruction cache. The central processing unit core requests additional program instructions when needed via a request address. Upon a miss in the level one instruction cache that causes a hit in the upper half of a level two cache line, the level two cache supplies the upper half level cache line to the level one instruction cache. On a following level two cache memory cycle, the level two cache supplies the lower half of the cache line to the level one instruction cache. This cache technique thus prefetches the lower half level two cache line employing fewer resources than an ordinary prefetch.