Patent classifications
G06F9/30101
INFERRING FUTURE VALUE FOR SPECULATIVE BRANCH RESOLUTION IN A MICROPROCESSOR
A system, processor, programming product and/or method including: an instruction dispatch unit configured to dispatch instructions of a compare immediate-conditional branch instruction sequence; and a compare register having at least one entry to hold information in a plurality of fields. Operations include: writing information from a first instruction of the compare immediate-conditional branch instruction sequence into one or more of the plurality of fields in an entry in the compare register; writing an immediate field and the ITAG of a compare immediate instruction into the entry in the compare register; writing, in response to dispatching a conditional branch instruction, an inferred compare result value into the entry in the compare register; comparing a computed compare result value to the inferred compare result value stored in the entry in the compare register; and not execute the compare immediate instruction or the conditional branch instruction.
Memory tagging apparatus and method
An apparatus and method for tagged memory management. For example, one embodiment of a processor comprises: execution circuitry to execute instructions and process data, at least one instruction to generate a system memory access request having a first address pointer; and address translation circuitry to determine whether to translate the first address pointer with or without metadata processing, wherein if the first address pointer is to be translated with metadata processing, the address translation circuitry to: perform a lookup in a memory metadata table to identify a memory metadata value, determine a pointer metadata value associated with the first address pointer, and compare the memory metadata value with the pointer metadata value, the comparison to generate a validation of the memory access request or a fault condition, wherein if the comparison results in a validation of the memory access request, then accessing a set of one or more address translation tables to translate the first address pointer to a first physical address and to return the first physical address responsive to the memory access request.
Fast incremental shared constants
This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.
Method of completing a programmable atomic transaction by ensuring memory locks are cleared
Disclosed is an instruction for a programmable atomic transaction that is executed as the last instruction and that terminates the executing thread, waits for all outstanding store operations to finish, clears the programmable atomic lock, and sends a completion response back to the issuing process. This guarantees that the programmable atomic lock is cleared when the transaction completes. By coupling thread termination with clearing the lock bit, this guarantees that the thread cannot terminate without clearing the lock.
Responding to branch misprediction for predicated-loop-terminating branch instruction
A predicated-loop-terminating branch instruction controls, based on whether a loop termination condition is satisfied, whether the processing circuitry should process a further iteration of a predicated loop body or process a following instruction. If at least one unnecessary iteration of the predicated loop body is processed following a mispredicted-non-termination branch misprediction when the loop termination condition is mispredicted as unsatisfied for a given iteration when it should have been satisfied, processing of the at least one unnecessary iteration of the predicated loop body is predicated to suppress an effect of the at least one unnecessary iteration. When the mispredicted-non-termination branch misprediction is detected for the given iteration of the predicated-loop-terminating branch instruction, in response to determining that a flush suppressing condition is satisfied, flushing of the at least one unnecessary iteration of the predicated loop body is suppressed as a response to the mispredicted-non-termination branch misprediction.
STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.
Monolithic vector processor configured to operate on variable length vectors using a vector length register
A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
Data processing device and method for processing an interrupt
A data processing device is described including one or more processors implementing a plurality of data processing entities, one or more software interrupt nodes and an access register for each software interrupt node. The access register specifies which one or more data processing entities of the plurality of data processing entities is/are each allowed to, as interrupt source data processing entity, trigger an interrupt service request on the software interrupt node for another one of the plurality of data processing entities as an interrupt target processing entity. Each software interrupt node is configured to forward an interrupt service request triggered by an interrupt source data processing entity which is allowed to trigger an interrupt service request on the software interrupt node to an interrupt target processing entity.
Bit width reconfiguration using a shadow-latch configured register file
A processor includes a front-end with an instruction set that operates at a first bit width and a floating point unit coupled to receive the instruction set in the processor that operates at the first bit width. The floating point unit operates at a second bit width and, based upon a bit width assessment of the instruction set provided to the floating point unit, the floating point unit employs a shadow-latch configured floating point register file to perform bit width reconfiguration. The shadow-latch configured floating point register file includes a plurality of regular latches and a plurality of shadow latches for storing data that is to be either read from or written to the shadow latches. The bit width reconfiguration enables the floating point unit that operates at the second bit width to operate on the instruction set received at the first bit width.
Virtual trusted platform modules
In some examples, a storage medium stores a plurality of information elements that relate to corresponding virtual trusted platform module (TPM) interfaces, where each respective information element of the plurality of information elements corresponds to a respective virtual machine (VM). A controller provides virtual TPMs for respective security operations. A processor resource executes the VMs to use the information elements to access the corresponding virtual TPM interfaces to invoke the security operations of the virtual TPMs, where a first VM is to access a first virtual TPM interface of the virtual TPM interfaces to request that a security operation of a respective virtual TPM be performed.