Patent classifications
G06F9/30076
Control flow mechanism for execution of graphics processor instructions using active channel packing
An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.
Compiler-assisted inter-SIMD-group register sharing
Systems, apparatuses, and methods for efficiently sharing registers among threads are disclosed. A system includes at least a processor, control logic, and a register file with a plurality of registers. The processor assigns a base set of registers to each thread of a plurality of threads executing on the processor. When a given thread needs more than the base set of registers to execute a given phase of program code, the given thread executes an acquire instruction to acquire exclusive access to an extended set of registers from a shared resource pool. When the given thread no longer needs additional registers, the given thread executes a release instruction to release the extended set of registers back into the shared register pool for other threads to use. In one implementation, the compiler inserts acquire and release instructions into the program code based on a register liveness analysis performed during compilation.
NEURAL NETWORK PROCESSING ASSIST INSTRUCTION
A first processor processes an instruction configured to perform a plurality of functions. The plurality of functions includes one or more functions to operate on one or more tensors. A determination is made of a function of the plurality of functions to be performed. The first processor provides to a second processor information related to the function. The second processor is to perform the function. The first processor and the second processor share memory providing memory coherence.
PROCESSING-IN-MEMORY (PIM) SYSTEM AND OPERATING METHODS OF THE PIM SYSTEM
A processing-in-memory (PIM) system includes a host configured to generate a first request for a memory access operation and a second request for an arithmetic operation, a PIM controller configured to generate a first command based on the first request or the second request, a high speed interface configured to generate a second command based on the second request, and a PIM device configured to perform the memory access operation in response to the first command from the PIM controller and to perform the arithmetic operation in response to the second command from the high speed interface.
INSTRUCTION TO QUERY FOR MODEL-DEPENDENT INFORMATION
An instruction is executed to perform a query function. The executing includes obtaining information relating to a selected model of a processor. The information includes at least one model-dependent data attribute of the selected model of the processor. The information is placed in a selected location for use by at least one application in performing one or more functions.
Executing multiple programs simultaneously on a processor core
Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.
GRAPHICS PROCESSING
There is disclosed an instruction that can be included into a graphics processor shader program to be executed by a group of execution threads that when executed will cause a group of execution lanes to be in an ‘active’ (e.g. SIMD) execution state in which active state processing operations can be performed using the group of plural execution lanes together. The processing operations can then be performed using the execution lanes in the active state together. The execution lanes are then allowed or caused to return to their prior execution state once the processing operations have finished.
OPERATING SYSTEM DEACTIVATION OF STORAGE BLOCK WRITE PROTECTION ABSENT QUIESCING OF PROCESSORS
Operating system deactivation of write protection for a storage block is provided absent quiescing of processors in a multi-processor computing environment. The process includes receiving an address translation protection exception interrupt resulting from an attempted write access by a processor to a storage block, and determining by the operating system whether write protection for the storage block is active. Based on write protection for the storage block not being active, the operating system issues an instruction to clear or modify translation lookaside buffer entries of the processor associated with the storage block, absent waiting for an action by another processor of multiple processors of the computing environment, to facilitate write access to the storage block proceeding at the processor.
Non-posted write transactions for a computer bus
Systems and devices can include a controller and a command queue to buffer incoming write requests into the device. The controller can receive, from a client across a link, a non-posted write request (e.g., a deferred memory write (DMWr) request) in a transaction layer packet (TLP) to the command queue; determine that the command queue can accept the DMWr request; identify, from the TLP, a successful completion (SC) message that indicates that the DMWr request was accepted into the command queue; and transmit, to the client across the link, the SC message that indicates that the DMWr request was accepted into the command queue. The controller can receive a second DMWr request in a second TLP; determine that the command queue is full; and transmit a memory request retry status (MRS) message to be transmitted to the client in response to the command queue being full.
Spoofing a processor identification instruction
Embodiments of processors, methods, and systems for a processor core supporting processor identification instruction spoofing are described. In an embodiment, a processor includes an instruction decoder and processor identification instruction spoofing logic. The processor identification spoofing logic is to respond to a processor identification instruction by reporting processor identification information from a processor identification spoofing data structure. The processor identification spoofing data structure is to include processor identification information of one or more other processors.