G06F9/3854

Multi-nullification

Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.

LOAD/STORE UNIT FOR A PROCESSOR, AND APPLICATIONS THEREOF
20180203702 · 2018-07-19 ·

A load/store unit for a processor, and applications thereof. In an embodiment, the load/store unit includes a load/store queue configured to store information and data associated with a particular class of instructions. Data stored in the load/store queue can be bypassed to dependent instructions. When an instruction belonging to The particular class of instructions graduates and the instruction is associated with a cache miss, control logic causes a pointer to be stored in a load/store graduation buffer that points to an entry in the load/store queue associated with the instruction. The load/store graduation buffer ensures that graduated instructions access a shared resource of the load/store unit in program order.

Vector processor with extended vector registers
12124849 · 2024-10-22 · ·

A processor includes a time counter, a vector coprocessor, and an extended vector register file for executing vector instructions and extending the data width of vector registers. The processor statically dispatches vector instructions with preset execution times based on a write time of a register in a coprocessor register scoreboard and a time counter provided to a vector execution pipeline.

Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
10019263 · 2018-07-10 · ·

In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and implementing speculative execution, wherein results of speculative execution can be saved in the store retirement/reorder buffer as a speculative state. The method further includes, upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching; and, in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match, and forwarding data from the first match to the subsequent load. Once speculative outcomes are known, the speculative state is retired to memory.

Selectively performing a single cycle write operation with ECC in a data processing system
10019266 · 2018-07-10 · ·

A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.

Architecture emulation in a parallel processing environment

An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.

METHOD TO DO CONTROL SPECULATION ON LOADS IN A HIGH PERFORMANCE STRAND-BASED LOOP ACCELERATOR

An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.

Freelist based global completion table having both thread-specific and global completion table identifiers

Managing a global completion table used to track progress of groups of instructions, in which each group of instructions includes one or more instructions. Entries of the global completion table are allocated to the groups of instructions from a freelist of entries. That is, entries are allocated from a pool of entries, rather than allocating entries in-order in a circular queue.

Freelist based global completion table having both thread-specific and global completion table identifiers

Managing a global completion table used to track progress of groups of instructions, in which each group of instructions includes one or more instructions. Entries of the global completion table are allocated to the groups of instructions from a freelist of entries. That is, entries are allocated from a pool of entries, rather than allocating entries in-order in a circular queue.

MEMORY SEQUENCING WITH COHERENT AND NON-COHERENT SUB-SYSTEMS

Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.