Patent classifications
G06F9/312
Methods and systems for using state vector data in a state machine engine
A state machine engine includes a state vector system. The state vector system includes an input buffer configured to receive state vector data from a restore buffer and to provide state vector data to a state machine lattice. The state vector system also includes an output buffer configured to receive state vector data from the state machine lattice and to provide state vector data to a save buffer.
Dynamically selecting a memory boundary to be used in performing operations
A selected boundary of memory to be used in processing an instruction is dynamically selected, based on a predictor. The instruction is decoded, and the decoding provides a sequence of operations to perform a specified operation. The sequence of operations includes a load to boundary operation to load data up to the selected boundary of memory. The data is loaded as part of the specified operation.
Method to do control speculation on loads in a high performance strand-based loop accelerator
An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
Supporting binary translation alias detection in an out-of-order processor
In one implementation, a processing device is provided that includes a memory to store instructions and a processor core to execute the instructions. The processor core is to receive a sequence of instructions reordered by a binary translator for execution. A first load of the sequence of instructions is identified. The first load references a memory location that stores a data item to be loaded. An occurrence of a second load is detected. The second load to access the memory location subsequent to an execution of the first load instruction. A protection field in the first load is enabled based on the detected occurrence of the second load. The enabled protection field indicates that the first load is to be checked for an aliasing associated with the memory location with respect to a subsequent store instruction. The second load is eliminated based on the enabled of the protection field.
Method and apparatus for supporting quasi-posted loads
A processor includes a decoder, a data return buffer, and an execution unit. The decoder is to decode an instruction for a non-posted load into a decoded instruction for loading data from memory mapped input/output. The execution unit is for executing the decoded instruction. The execution is to start a timer, determine whether the timer exceeds a timeout threshold, allocate an entry in the data return buffer for the load, and determine whether an event arrived. The timer is to measure an amount of time taken to return the non-posted load instruction. The determination whether an event arrived is made in response to at least one of the allocation of the entry for the load, or a determination that the timer exceeds the timeout threshold.
Instruction to cancel outstanding cache prefetches
Techniques relate to handling outstanding cache miss prefetches. A processor pipeline recognizes that a prefetch canceling instruction is being executed. In response to recognizing that the prefetch canceling instruction is being executed, all outstanding prefetches are evaluated according to a criterion as set forth by the prefetch canceling instruction in order to select qualified prefetches. In response to evaluating, a cache subsystem is communicated with to cause canceling of the qualified prefetches that fit the criterion. In response to successful canceling of the qualified prefetches, a local cache is prevented from being updated from the qualified prefetches.
Fused adjacent memory stores
A processing device includes a store instruction identification unit to identify a pair of store instructions among a plurality of instructions in an instruction queue. The pair of store instructions include a first store instruction and a second store instruction. The first data of the first store instruction corresponds to a first memory region adjacent to a second memory region, and a second data of the second store instruction corresponds to the second memory region. The processing device to include a store instruction fusion unit to fuse the first store instruction with the second store instruction resulting in a fused store instruction.
Streaming engine with stream metadata saving for context switching
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.
Write nullification
Apparatus and methods are disclosed for nullifying one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain a register identification of at least one of a plurality of registers, based on a target field of the nullification instruction. A write to the at least one register associated with the register identification is nullified. The nullification instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified write to the at least one register, a subsequent instruction is executed from a second, different instruction block.
Conflict mask generation
Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can generate a first bitmap comprising a plurality of bits, where the plurality of bits includes a first bit that represents a first memory location. The processor core can determine that the value of the first bit is equal to the value of a second bit in the first bitmap. The processor core can determine the location of the second bit in relation to the first bit in the first bitmap. The processor core can generate a second bitmap including a third bit indicating that the first bit is the last bit in the first bitmap with the same value as the second bit.