IPIQ

G06F9/3832

Providing load address predictions using address prediction tables based on load path history in processor-based systems

11709679 · 2023-07-25 ·

Qualcomm Incorporated

Aspects disclosed in the detailed description include providing load address predictions using address prediction tables based on load path history in processor-based systems. In one aspect, a load address prediction engine provides a load address prediction table containing multiple load address prediction table entries. Each load address prediction table entry includes a predictor tag field and a memory address field for a load instruction. The load address prediction engine generates a table index and a predictor tag based on an identifier and a load path history for a detected load instruction. The table index is used to look up a corresponding load address prediction table entry. If the predictor tag matches the predictor tag field of the load address prediction table entry corresponding to the table index, the memory address field of the load address prediction table entry is provided as a predicted memory address for the load instruction.

Enabling removal and reconstruction of flag operations in a processor

11709678 · 2023-07-25 ·

Intel Corporation

In one embodiment, a processor includes fetch logic to fetch instructions, decode logic to decode the fetched instructions, and execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer.

ZERO OPERAND INSTRUCTION CONVERSION FOR ACCELERATING SPARSE COMPUTATIONS IN A CENTRAL PROCESSING UNIT PIPELINE

20230024089 · 2023-01-26 ·

A processing device includes a zero detection circuit to determine that an operand of a first instruction is zero and instruction conversion logic coupled with the zero detection circuit to, in response to the zero detection circuit determining that the operand is zero, convert the first instruction to a register move instruction executable by the processing device.

PROCESSOR CIRCUIT AND DATA PROCESSING METHOD

20230020505 · 2023-01-19 ·

CHIA-I CHEN

A processor circuit includes an instruction decode unit, an instruction detector, an address generator and a data buffer. The instruction decode unit is configured to decode a first load instruction included in a plurality of load instructions to generate a first decoding result. The instruction detector, coupled to the instruction decode unit, is configured to detect if the load instructions use a same register. The address generator, coupled to the instruction decode unit, is configured to generate a first address requested by the first load instruction according to the first decoding result. The data buffer is coupled to the instruction detector and the address generator. When the instruction detector detects that the load instructions use the same register, the data buffer is configured to store the first address generated from the address generator, and store data requested by the first load instruction according to the first address.

Combining load or store instructions

11593117 · 2023-02-28 ·

Qualcomm Incorporated

Various aspects disclosed herein relate to combining instructions to load data from or store data in memory while processing instructions in a computer processor. More particularly, at least one pattern of multiple memory access instructions that reference a common base register and do not fully utilize an available bus width may be identified in a processor pipeline. In response to determining that the multiple memory access instructions target adjacent memory or non-contiguous memory that can fit on a single cache line, the multiple memory access instructions may be replaced within the processor pipeline with one equivalent memory access instruction that utilizes more of the available bus width than either of the replaced memory access instructions.

Vector prefetching for computing systems

11500779 · 2022-11-15 ·

Marvell Asia Pte, Ltd.

Shubhendu Sekhar Mukherjee

Described is a computing system for vector prefetching which includes a hierarchical memory including multiple caches, a missing address storage unit (MASU) associated with each cache which stores prefetch requests suffering a cache miss, a prefetcher which sends prefetch requests towards the hierarchical memory, and a vector prefetch unit. The vector prefetch unit determines existence of at least one of a relationship between a cache block associated with the prefetch request and cache blocks associated with one or more entries in a MASU, or a relationship between cache blocks associated with different entries in a MASU, and sends a vector prefetch request based on related prefetch requests including indicators indicating a starting cache block and a number of related cache blocks to a higher memory level to obtain data associated with each cache block. The hierarchical memory stores the data received in at least one response message from the higher memory level if available.

PHYSICAL ADDRESS PROXY REUSE MANAGEMENT

20220358037 · 2022-11-10 ·

Each load/store queue entry holds a load/store physical address proxy (PAP) for use as a proxy for a load/store physical memory line address (PMLA). The load/store PAP comprises a set index and a way that uniquely identifies an L2 cache entry holding a memory line at the load/store PMLA when an L1 cache provides the load/store PAP during the load/store instruction execution. The microprocessor removes a line at a removal PMLA from an L2 entry, forms a removal PAP as a proxy for the removal PMLA that comprises a set index and a way, snoops the load/store queue with the removal PAP to determine whether the removal PAP is being used as a proxy for the removal PMLA, fills the removed entry with a line at a fill PMLA, and prevents the removal PAP from being used as a proxy for the removal PMLA and the fill PMLA concurrently.

STORE-TO-LOAD FORWARDING CORRECTNESS CHECKS AT STORE INSTRUCTION COMMIT

20220357955 · 2022-11-10 ·

A microprocessor includes a load queue, a store queue, and a load/store unit that, during execution of a store instruction, records store information to a store queue entry. The store information comprises store address and store size information about store data to be stored by the store instruction. The load/store unit, during execution of a load instruction that is younger in program order than the store instruction, performs forwarding behavior with respect to forwarding or not forwarding the store data from the store instruction to the load instruction and records load information to a load queue entry, which comprises load address and load size information about load data to be loaded by the load instruction, and records the forwarding behavior in the load queue entry. The load/store unit, during commit of the store instruction, uses the recorded store information and the recorded load information and the recorded forwarding behavior to check correctness of the forwarding behavior.

Complex I/O value prediction for multiple values with physical or virtual addresses

11573800 · 2023-02-07 ·

Marvell Asia Pte, Ltd.

An apparatus, and corresponding method, for input/output (I/O) value determination, generates an I/O instruction for an I/O device, the I/O device including a state machine with state transition logic. The apparatus comprises a controller that includes a simplified state machine with a reduced version of the state transition logic of the state machine of the I/O device. The controller is configured to improve instruction execution performance of a processor core by employing the simplified state machine to predict at least one state value of at least one I/O device true state value to be affected by the I/O instruction at the I/O device.

Reuse in-flight register data in a processor

11614942 · 2023-03-28 ·

Micron Technology, Inc.

Devices and techniques for short-thread rescheduling in a processor are described herein. When an instruction for a thread completes, a result is produced. The condition that the same thread is scheduled in a next execution slot and that the next instruction of the thread will use the result can be detected. In response to this condition, the result can be provided directly to an execution unit for the next instruction.

Patent classifications

G06F9/3832