Patent classifications
G06F2009/3883
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING
Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
Framework to provide time bound execution of co-processor commands
When a main processor issues a command to co-processor, a timeout value is included in the command. As the co-processor attempts to execute the command, it is determined whether the attempt is taking time beyond what is permitted by the timeout value. If the timeout is exceeded then responsive action is taken, such as the generation of a command timeout type failure message. The receipt of the command with the timeout value, and the consequent determination of a timeout condition for the command, may be determined by: the co-processor that receives the command, or a watchdog timer that is separate from the co-processor. Also, detection of co-processor hang and/or hung co-processor conditions during the time that a co-processor is executing a command for the main processor.
Method and apparatus for asynchronous processor pipeline and bypass passing
A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results.
Method and apparatus for asynchronous processor removal of meta-stability
A clock-less asynchronous processing circuit or system having a plurality of pipelined processing stages utilizes self-clocked generators to tune the delay needed in each of the processing stages to complete the processing cycle. Because different processing stages may require different amounts of time to complete processing or may require different delays depending on the processing required in a particular stage, the self-clocked generators may be tuned to each stage's necessary delay(s) or may be programmably configured.
Method and apparatus for asynchronous processor based on clock delay adjustment
A clock-less asynchronous processing circuit or system utilizes a self-clocked generator to adjust the processing delay (latency) needed/allowed to the processing cycle in the circuit/system. The timing of the self-clocked generator is dynamically adjustable depending on various parameters. These parameters may include processing instruction, opcode information, type of processing to be performed by the circuit/system, or overall desired processing performance. The latency may also be adjusted to change processing performance, including power consumption, speed etc.
Buffer checker for task processing fault detection
A graphics processing system with a data store includes processing units for processing tasks. A check unit forms a signature which is characteristic of an output from processing a task on a processing unit, and a fault detection unit compares signatures formed at the check unit. Each task is processed first and second times at the processing units to generate first and second processed outputs. The graphics processing system write outs the first processed output to the data store, reads back the first processed output from the data store and forms at the check unit a first signature characteristic of the first processed output as read back from the data store; forms at the check unit a second signature characteristic of the second processed output, compares the first and second signatures at the fault detection unit, and raises a fault signal if the signatures do not match.
FLEXIBLE VECTOR-PROCESSING ALGORITHMS FOR NUMERICALLY SOLVING EXTREME-SCALE, LINEAR AND NON-LINEAR, PREDICTIVE AND PRESCRIPTIVE, PROBLEMS IN SCIENCE AND ENGINEERING, ON PARALLEL-PROCESSING SUPER COMPUTERS
A computer-implemented method for numerical solution of a geometric programming problem is described, including the computer-implemented steps of: reformulating the geometric programming problem as an equivalent generalized geometric programming optimization problem with only linear constraints, and solving the equivalent generalized geometric programming optimization problem by vector processing, including determining by computer-implemented numerical computation a solution for an unconstrained objective function whose independent vector variable is the generalized geometric programming conjugate dual of a primal decision vector variable of the geometric programming problem, and includes a variable linear combination of fixed vectors enabling the vector processing. Also described are computer-readable storage devices, computer program products, and computer systems for such numerical solution methodology.
Buffer Checker for Task Processing Fault Detection
A graphics processing system with a data store includes processing units for processing tasks. A check unit forms a signature which is characteristic of an output from processing a task on a processing unit, and a fault detection unit compares signatures formed at the check unit. Each task is processed first and second times at the processing units to generate first and second processed outputs. The graphics processing system write outs the first processed output to the data store, reads back the first processed output from the data store and forms at the check unit a first signature characteristic of the first processed output as read back from the data store; forms at the check unit a second signature characteristic of the second processed output, compares the first and second signatures at the fault detection unit, and raises a fault signal if the signatures do not match.