Patent classifications
G06F9/3005
Computation and prediction of linked access
An example operation includes one or more of traversing, by a modeling node, a supply-chain downstream from an initial step, detecting, by a modeling node, a multi-organization step, and responsive to the detection of the multi-organization step, executing a branch prediction algorithm to determine downstream granted organizations.
PROCESSOR WITH SLAVE FREE LIST THAT HANDLES OVERFLOW OF RECYCLED PHYSICAL REGISTERS AND METHOD OF RECYCLING PHYSICAL REGISTERS IN A PROCESSOR USING A SLAVE FREE LIST
A processor including physical registers, a reorder buffer, a master free list, a slave free list, a master recycle circuit, and a slave recycle circuit. The reorder buffer includes instruction entries in which each entry stores physical register indexes for recycling physical registers. The reorder buffer retires up to N instructions in each processor cycle. Each master and slave free list includes N input ports and stores physical register indexes, in which the master free list stores indexes of physical registers to be allocated to instructions being issued. When an instruction is retired, the master recycle circuit routes a first physical register index stored in an instruction entry of the instruction to an input port of the master free list, and the slave recycle circuit routes a second physical register index stored in the instruction entry of the instruction to an input port of the slave free list.
SCATTER TO GATHER OPERATION
Systems and methods relate to efficient memory operations. A single instruction multiple data (SIMD) gather operation is implemented with a gather result buffer located within or in close proximity to memory, to receive or gather multiple data elements from multiple orthogonal locations in a memory, and once the gather result buffer is complete, the gathered data is transferred to a processor register. A SIMD copy operation is performed by executing two or more instructions for copying multiple data elements from multiple orthogonal source addresses to corresponding multiple destination addresses within the memory, without an intermediate copy to a processor register. Thus, the memory operations are performed in a background mode without direction by the processor.
LOAD-STORE QUEUE FOR MULTIPLE PROCESSOR CORES
Technology related to load-store queues for block-based processor architectures is disclosed. In one example of the disclosed technology, a processor includes multiple processor cores and a load-store queue. Each processor core is configured to execute an instruction block including load and store instructions. The instruction block can be identified by a block identifier, and each of the load and store instructions is identified with a load-store identifier. The load-store queue can be configured to enqueue load and store instructions from the processor cores in a buffer indexed based on a function of the block identifier and the load-store identifier. The buffer can be searched for store instructions having a target address matching a target address of a load instruction received from a first processor core. Load response data can be returned for the received load instruction to the first processor core based on the search of the buffer.
Dynamic branch hints using branches-to-nowhere conditional branch
A processor includes an execution pipeline having one or more execution units to execute instructions and a branch prediction unit coupled to the execution units. The branch prediction unit includes a branch history table to store prior branch predictions, a branch predictor, in response to a conditional branch instruction, to predict a branch target address of the conditional branch instruction based on the branch history table, and address match logic to compare the predicted branch target address with an address of a next instruction executed immediately following the conditional branch instruction. The address match logic is to cause the execution pipeline to be flushed if the predicted branch target address does not match the address of the next instruction to be executed.
MULTI-LAYER DATA CACHE TO PREVENT USER EXPERIENCE INTERRUPTS DURING FEATURE FLAG MANAGEMENT
There are provided systems and methods for a multi-layer cache to prevent user experience interrupts during feature flag management. A service provider may provide applications to computing devices of users including mobile applications. Use and availability of features in an application may be configured using feature flags, however, change of these feature flags may initiate an application refresh that affects user experiences with the application. To prevent interruptions, a multi-layer data cache may be used where feature flag data for the feature flags may initially be loaded, after a time period, to a first layer cache that is not used to update the application. When conditions exist for updating the application without affecting the user experience, such as if the user is no longer using a workflow, the feature flag data may be loaded to a second layer cache. The second layer cache may then be used for updating.
HARDWARE SUPPORTED SPLIT BARRIER
A disclosed technique includes executing, for a first wavefront, a barrier arrival notification instruction, for a first barrier, indicating arrival at a first barrier point; performing, for the first wavefront, work prior to the first barrier point; executing, for the first wavefront, a barrier check instruction; and executing, for the first wavefront, at a control flow path based on a result of the barrier check instruction.
Out-of-order block-based processors and instruction schedulers using ready state data indexed by instruction position identifiers
Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.
Processing a Plurality of Threads of a Single Instruction Multiple Data Group
Methods, systems and apparatuses for processing a plurality of threads of a single-instruction multiple data (SIMD) group are disclosed. One method includes initializing a current instruction pointer of the SIMD group, initializing a thread instruction pointer for each of the plurality of threads of the SIMD group including setting a flag for each of the plurality of threads, determining whether a current instruction of the processing includes a conditional branch, resetting a flag of each thread of the plurality of threads that fails a condition of the conditional branch, and setting the thread instruction pointer for each of the plurality of threads that fails the condition of the conditional branch to a jump instruction pointer, and incrementing the current instruction pointer and each thread instruction pointer of the threads that do not fail, if at least one of the threads do not fail the condition.
TECHNOLOGIES FOR EXECUTE ONLY TRANSACTIONAL MEMORY
Technologies for execute only transactional memory include a computing device with a processor and a memory. The processor includes an instruction translation lookaside buffer (iTLB) and a data translation lookaside buffer (dTLB). In response to a page miss, the processor determines whether a page physical address is within an execute only transactional (XOT) range of the memory. If within the XOT range, the processor may populate the iTLB with the page physical address and prevent the dTLB from being populated with the page physical address. In response to an asynchronous change of control flow such as an interrupt, the processor determines whether a last iTLB translation is within the XOT range. If within the XOT range, the processor clears or otherwise secures the processor register state. The processor ensures that an XOT range starts execution at an authorized entry point. Other embodiments are described and claimed.