G06F9/3806

Controlling accesses to a branch prediction unit for sequences of fetch groups

An electronic device is described that handles control transfer instructions (CTIs) when executing instructions in program code. The electronic device has a processor that includes a branch prediction functional block and a sequential fetch logic functional block. The sequential fetch logic functional block determines, based on a record associated with a CTI, that a specified number of fetch groups of instructions that were previously determined to include no CTIs are to be fetched for execution in sequence following the CTI. When each of the specified number of fetch groups is fetched and prepared for execution, the sequential fetch logic prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that fetch group.

Branch confidence throttle
11507380 · 2022-11-22 · ·

A processing system includes a processor with a branch predictor including one or more branch target buffer tables. The processor also includes a branch prediction pipeline including a throttle unit and an uncertainty accumulator. The processor assigns an uncertainty value for each of a plurality of branch predictions generated by the branch predictor and adds the uncertainty value for each of the plurality of branch predictions to an accumulated uncertainty counter associated with the uncertainty accumulator. The throttle unit of the branch prediction pipeline throttles operations of the branch prediction pipeline based on the accumulated uncertainty counter.

Full asynchronous execution queue for accelerator hardware
11593157 · 2023-02-28 · ·

A method for providing an asynchronous execution queue for accelerator hardware includes replacing a malloc operation in an execution queue to be sent to an accelerator with an asynchronous malloc operation that returns a unique reference pointer. Execution of the asynchronous malloc operation in the execution queue by the accelerator allocates a requested memory size and adds an entry to a look-up table accessible by the accelerator that maps the reference pointer to a corresponding memory address.

METADATA PREDICTOR

Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.

USING METADATA PRESENCE INFORMATION TO DETERMINE WHEN TO ACCESS A HIGHER-LEVEL METADATA TABLE

Embodiments are provided for using metadata presence information to determine when to access a higher-level metadata table. It is determined that an incomplete hit occurred for a line of metadata in a lower-level structure of a processor, the lower-level structure being coupled to a higher-level structure in a hierarchy. It is determined that metadata presence information in a metadata presence table is a match to the line of metadata from the lower-level structure. Responsive to determining the match, it is determined to avoid accessing the higher-level structure of the processor.

MANAGING RETURN PARAMETER ALLOCATION
20230058935 · 2023-02-23 ·

A hybrid threading processor (HTP) supports thread creation by executing an instruction that indicates an amount of storage space to reserve for return values. Before a thread is created, the indicated amount of space is reserved. The newly created child thread sends a return packet back to the parent thread when the child thread completes. The thread writes its return information into the reserved space and waits for the parent thread to execute a thread join instruction. The thread join instruction takes the returned information from the reserved space and transfers it to the parent thread's register state. The reserved space is released once the child thread is joined. Using a configurable amount of space for each child thread may allow for more child threads to be executed simultaneously.

VARIABLE FORMATTING OF BRANCH TARGET BUFFER

Embodiments include a hierarchical metadata prediction system that includes a first line-based predictor having a first line for storage of metadata entries, and a second line-based predictor configured to store metadata entries from the first line-based predictor. The second line-based predictor has a second line, the second line including a plurality of containers, the plurality of containers including at least a first set of containers having a first size and a second set of containers having a second size. The system also includes a processing device configured to transfer one or more metadata entries between the first line-based predictor and the second-line based predictor. Embodiments also include a computer-implemented method and a computer program product.

Link stack based instruction prefetch augmentation

A computer-implemented method of performing a link stack based prefetch augmentation using a sequential prefetching includes observing a call instruction in a program being executed, and pushing a return address onto a link stack for processing the next instruction. A stream of instructions is prefetched starting from a cached line address of the next instruction and is stored in an instruction cache.

Servicing CPU demand requests with inflight prefetches

Disclosed embodiments provide a technique in which a memory controller determines whether a fetch address is a miss in an L1 cache and, when a miss occurs, allocates a way of the L1 cache, determines whether the allocated way matches a scoreboard entry of pending service requests, and, when such a match is found, determine whether a request address of the matching scoreboard entry matches the fetch address. When the matching scoreboard entry also has a request address matching the fetch address, the scoreboard entry is modified to a demand request.

FETCH QUEUES USING CONTROL FLOW PREDICTION
20220357953 · 2022-11-10 ·

A data processing apparatus is provided. It includes control flow detection prediction circuitry that performs a presence prediction of whether a block of instructions contains a control flow instruction. A fetch queue stores, in association with prediction information, a queue of indications of the instructions and the prediction information comprises the presence prediction. An instruction cache stores fetched instructions that have been fetched according to the fetch queue. Post-fetch correction circuitry receives the fetched instructions prior to the fetched instructions being received by decode circuitry, the post-fetch correction circuitry includes analysis circuitry that causes the fetch queue to be at least partly flushed in dependence on a type of a given fetched instruction and the prediction information associated with the given fetched instruction.