G06F9/3832

FETCHED DATA IN AN ULTRA-SHORT PIPED LOAD STORE UNIT
20170351521 · 2017-12-07 ·

Techniques are disclosed for receiving an instruction for processing data that includes a plurality of sectors. A method includes decoding the instruction to determine which of the plurality of sectors are needed to process the instruction and fetching at least one of the plurality of sectors from memory. The method includes determining whether each sector that is needed to process the instruction has been fetched. If all sectors needed to process the instruction have been fetched, the method includes transmitting a sector valid signal and processing the instruction. If all sectors needed to process the instruction have not been fetched, the method includes blocking a data valid signal from being transmitted, fetching an additional one or more of the plurality of sectors until all sectors needed to process the instruction have been fetched, transmitting a sector valid signal, and reissuing and processing the instruction using the fetched sectors.

Apparatus and method with value prediction for load operation
11513966 · 2022-11-29 · ·

An apparatus has processing circuitry, load tracking circuitry and value prediction circuitry. In response to an actual value of first target data becoming available for a value-predicted load operation, it is determined whether the actual value matches the predicted value of the first target data determined by the value prediction circuitry, and whether the tracking information indicates that, for a given younger load operation issued before the actual value of the first target data was available, there is a risk of second target data associated with that given load operation having changed after having been loaded. Independent of whether the addresses of the value-predicted load operation and younger load operation correspond, at least the given load operation is re-processed when the value prediction is correct and the tracking information indicates there is a risk of the second target data having changes after being loaded. This protects against ordering violations.

APPARATUS AND METHOD WITH PREDICTION FOR LOAD OPERATION
20230185573 · 2023-06-15 ·

An apparatus has processing circuitry, load tracking circuitry and load prediction circuitry. It is determined whether tracking information indicates that there is a risk of target data, corresponding to an address of a speculatively-issued load operation which is speculatively issued (bypassing an older operation) based on a prediction determined by the load prediction circuitry, having changed between the target data being loaded for the speculatively-issued load operation and data being loaded for a given older load operation bypassed by the speculatively-issued load operation. If so, independent of whether the addresses of the speculatively-issued load operation and the given older load operation correspond, at least the speculatively-issued load operation is reissued, even when the prediction is correct. This protects against ordering violations.

Selectively enabled result lookaside buffer based on a hit rate

A processing system selectively enables and disables a result lookaside buffer (RLB) based on a hit rate tracked by a counter, thereby reducing power consumption for lookups at the result lookaside buffer during periods of low hit rates and improving the overall hit rate for the result lookaside buffer. A controller increments the counter in the event of a hit at the RLB and decrements the counter in the event of a miss at the RLB. If the value of the counter falls below a threshold value, the processing system temporarily disables the RLB for a programmable period of time. After the period of time, the processing system re-enables the RLB and resets the counter to an initial value.

PROVIDING LOAD ADDRESS PREDICTIONS USING ADDRESS PREDICTION TABLES BASED ON LOAD PATH HISTORY IN PROCESSOR-BASED SYSTEMS
20170286119 · 2017-10-05 ·

Aspects disclosed in the detailed description include providing load address predictions using address prediction tables based on load path history in processor-based systems. In one aspect, a load address prediction engine provides a load address prediction table containing multiple load address prediction table entries. Each load address prediction table entry includes a predictor tag field and a memory address field for a load instruction. The load address prediction engine generates a table index and a predictor tag based on an identifier and a load path history for a detected load instruction. The table index is used to look up a corresponding load address prediction table entry. If the predictor tag matches the predictor tag field of the load address prediction table entry corresponding to the table index, the memory address field of the load address prediction table entry is provided as a predicted memory address for the load instruction.

Auxiliary Cache for Reducing Instruction Fetch and Decode Bandwidth Requirements
20170286110 · 2017-10-05 ·

A hardware-software co-designed processor includes a front end to decode an instruction, an execution unit to execute the instruction, an auxiliary cache to store auxiliary information for consumption during execution of the instruction, an instruction blender, and a retirement unit to retire the instruction. The auxiliary information may include long immediate values, non-working instructions for emulating an untranslated instruction stream, or execution hints, and is not decoded by the front end. The auxiliary cache includes circuitry to receive the auxiliary information from a binary translator, to store the auxiliary information in the auxiliary cache, and to provide the auxiliary information to the instruction blender prior to execution. The instruction blender includes circuitry to receive the auxiliary information, to blend the instruction with the auxiliary information, and to provide the blended instruction to the execution unit. Use of the auxiliary cache may reduce fetch and decode bandwidth requirements.

COMBINING LOADS OR STORES IN COMPUTER PROCESSING

Aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in processors. An exemplary method includes detecting a pattern of pipelined instructions to access memory using a first portion of available bus width and, in response to detecting the pattern, combining the pipelined instructions into a single instruction to access the memory using a second portion of the available bus width that is wider than the first portion. Devices including processors using disclosed aspects may execute currently available software in a more efficient manner without the software being modified.

Computer processor employing byte-addressable dedicated memory for operand storage

A computer processor including a first memory structure that operates over multiple cycles to temporarily store operands referenced by at least one instruction. A plurality of functional units performs operations that produce and access operands stored in the first memory structure. A second memory structure is provided, separate from the first memory structure. The second memory structure is configured as a dedicated memory for storage of operands copied from the first memory structure. The second memory structure is organized with a byte-addressable memory space and each operand stored in the second memory structure is accessed by a given byte address into the byte-addressable memory space.

Efficient load value prediction

Certain aspects of the present disclosure provide techniques for training load value predictors, comprising: determining if a prediction has been made by one or more of a plurality of load value predictors; determining a misprediction has been made by one or more load value predictors of the plurality of load value predictors; training each of the one or more load value predictors that made the misprediction; and resetting a confidence value associated with each of the one or more load value predictors that made the misprediction.

Processor with memory-embedded pipeline for table-driven computation

A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.