Patent classifications
G06F9/3856
Tile Assignment to Processing Cores Within a Graphics Processing Unit
A graphics processing unit configured to process graphics data using a rendering space which is sub-divided into a plurality of tiles, the graphics processing unit comprising: a plurality of processing cores configured to render graphics data; cost indication logic configured to obtain a cost indication for each of a plurality of sets of one or more tiles of the rendering space, wherein the cost indication for a set of one or more tiles is suggestive of a cost of processing the set of one or more tiles; similarity indication logic configured to obtain similarity indications between sets of one or more tiles of the rendering space, wherein the similarity indication between two sets of one or more tiles is indicative of a level of similarity between the two sets of tiles according to at least one processing metric; and scheduling logic configured to assign the sets of one or more tiles to the processing cores for rendering in dependence on the cost indications and the similarity indications.
ACCELERATION OF OPERATIONS
Apparatuses, systems, and techniques to reduce a sequence of operations to an equivalent sequence having a smaller number of operations. In at least one embodiment, a sequence of matrix operations are accelerated by combining operations that reorder a matrix with a matrix multiplication operation.
METHOD AND APPARATUS FOR IMPLIED BIT HANDLING IN FLOATING POINT MULTIPLICATION
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
System and Method for Efficient Queue Management
A method, computer program product, and computing system for defining a queue. The queue may be based on a linked list and may be a first-in, first-out (FIFO) queue that may be configured to be use used with multiple producers and a single consumer. The queue may include a plurality of queue elements. A tail element and a head element may be defined from the plurality of elements within the queue. The tail element may point to a last element of the plurality of elements and the head element may point to a first element of a plurality of elements. An element may be dequeued from the tail element, which may include determining if the tail element is in a null state. An element may be enqueued to the head element, which may include adding a new element to the queue.
ASSIGNMENT OF MICROPROCESSOR REGISTER TAGS AT ISSUE TIME
Provided is a method for assigning register tags to instructions at issue time. The method comprises receiving an instruction for execution by a microprocessor. The method further comprises dispatching the instruction to an issue queue without assigning a register tag to the instruction. The method further comprises determining that the instruction is ready to issue. In response to determining that the instruction is ready to issue, the method comprises assigning an available register tag to the instruction. The method further comprises issuing the instruction.
PROXY SYSTEM CONFIGURED TO IMPROVE MESSAGE-TO-EXECUTION RATIO OF DISTRIBUTED SYSTEM
A proxy system receives and order instruction that combines multiple components configured for execution at a distributed system of subsystems. The proxy system predicts probable values and probable quantities for assets at which the proxy system is allowed to cause execution of the multiple components at the one or more subsystems. The proxy system creates a virtual proxy instruction that is a proxy for the order instruction and detects optimal conditions of the distributed system, and sends messages to cause execution of the multiple components at the subsystems. The indirect execution through the virtual proxy instruction reduces a message-to-execution ratio compared to direct execution of the multiple components based on the order instruction.
Execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array. A representative configurable circuit includes a configurable computation circuit and a configuration memory having a first, instruction memory storing a plurality of data path configuration instructions to configure a data path of the configurable computation circuit; and a second, instruction and instruction index memory storing a plurality of spoke instructions and data path configuration instruction indices for selection of a master synchronous input, a current data path configuration instruction, and a next data path configuration instruction for a next configurable computation circuit.
Pausing execution of a first machine code instruction with injection of a second machine code instruction in a processor
Aspects of the present disclosure provide a processor having: an execution unit configured to execute machine code instructions, at least one of the machine code instructions requiring multiple cycles for its execution; instruction memory holding instructions for execution, wherein the execution unit is configured to access the memory to fetch instructions for execution; an instruction injection mechanism configured to inject an instruction into the execution pipeline during execution of the at least one machine code instruction fetched from the memory; the execution unit configured to pause execution of the at least one machine code instruction, to execute the injected instruction to termination, to detect termination of the injected instruction and to automatically recommence execution of the at least one machine code instruction on detection of termination of the injected instruction.
Techniques for instruction perturbation for improved device security
Methods, systems, and devices for techniques for instruction perturbation for improved device security are described. A device may assign a set of executable instructions to an instruction packet based on a parameter associated with the instruction packet, and each executable instruction of the set of executable instructions may be independent from other executable instructions of the set of executable instructions. The device may select an order of the set of executable instructions based on a slot instruction rule associated with the device, and each executable instruction of the set of executable instructions may correspond to a respective slot associated with memory of the device. The device may modify the order of the set of executable instructions in a memory hierarchy post pre-decode based on the slot instruction rule and process the set of executable instructions of the instruction packet based on the modified order.
Processor dependency-aware instruction execution
In an approach to processor dependency-aware instruction execution, responsive to a new instruction being issued to an instruction issue queue in a processor, a future dependency count is incremented for each instruction of a plurality of instructions in the instruction issue queue that has a dependency on the new instruction. The plurality of instructions in the instruction issue queue are prioritized based on the future dependency count. The highest priority instruction of the plurality of instructions in the instruction issue queue is issued.