G06F9/3009

Initialisation of Worker Threads

A processing device comprising: at least one execution unit configured to interleave execution of a plurality of worker threads, wherein each of the worker threads is configured to execute a same set of code to perform operations on a different set of data held in an input buffer of a memory of the processing device and output the results data to an output buffer. An instruction is executed so as to cause a plurality of operand registers, each of which is associated with one of the worker threads, to be populated with one or more variables enabling each worker to determine where in the input buffer is located its set of input data and where to store its results data.

Data storage apparatus including swap memory and operating method thereof
11550578 · 2023-01-10 · ·

A data storage apparatus includes a storage device; a controller to control data input and output operations of the storage device; and a swap memory provided in an outside of the controller, wherein the controller includes a thread manager to perform a preparation operation on a first thread included in a task in response to a request for processing the task, request the storage device to process the first thread on which the preparation operation has been performed, perform a preparation operation on at least one subsequent thread following the first thread while the storage device processes the first thread, and store context data of the first thread and the at least one subsequent thread in the swap memory, wherein the task includes the first thread and the at least one subsequent thread, and the preparation operation includes an address mapping operation.

Mechanism to trigger early termination of cooperating processes

Devices and techniques for triggering early termination of cooperating processes in a processor are described herein. A system includes multiple memory-compute nodes, wherein a memory-compute node comprises: event manager circuitry configured to establish a broadcast channel to receive event messages; and thread manager circuitry configured to organize a plurality of threads to perform portions of a cooperative task, wherein the plurality of threads each monitor the broadcast channel to receive event messages on the broadcast channel, and wherein upon achieving a threshold operation, the thread manager circuitry is to use the event manager circuitry to broadcast, on the broadcast channel, an event message indicating that the cooperative task is complete, causing other threads, in response to receiving the event message, to terminate execution of their respective portions of the cooperative task.

Systems, methods, and apparatuses for heterogeneous computing

Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

Method of completing a programmable atomic transaction by ensuring memory locks are cleared
11693690 · 2023-07-04 · ·

Disclosed is an instruction for a programmable atomic transaction that is executed as the last instruction and that terminates the executing thread, waits for all outstanding store operations to finish, clears the programmable atomic lock, and sends a completion response back to the issuing process. This guarantees that the programmable atomic lock is cleared when the transaction completes. By coupling thread termination with clearing the lock bit, this guarantees that the thread cannot terminate without clearing the lock.

GPU networking using an integrated command processor

Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

METHOD AND SYSTEM FOR CONSTRUCTING PERSISTENT MEMORY INDEX IN NON-UNIFORM MEMORY ACCESS ARCHITECTURE
20220413952 · 2022-12-29 ·

A method for constructing a persistent memory index in a non-uniform memory access architecture includes: maintaining partial persistent views in a persistent memory and maintaining a global volatile view in a DRAM; an underlying persistent memory index processing a request in a foreground thread when cold data is accessed; when hot data is accessed, reading a key-value pair for a piece of hot data in the global volatile view in response to a query operation carried in the request, and in response to an insert/update/delete operation carried in the request, updating a local partial persistent view and the global volatile view; and in response to a hotspot migration, a background thread generating new partial persistent views and a new global volatile view, and recycling the partial persistent views and the global volatile view for old hot data into the underlying persistent memory index.

PARTITION MIGRATION WITH CRITICAL TASK PRIORITIZATION

An embodiment includes issuing an indication that a thread is a time-critical thread. The embodiment initiates an active partition migration, from a source server to a destination server, of a source partition on which the program is stored. The embodiment stores, during the migration, records of locations of pages in memory referenced by the time-critical thread. The embodiment detects that a migration threshold has been reached, indicative of a threshold portion of the migration being complete. Responsive to detecting the migration threshold, the embodiment performs a priority migration of the time-critical thread. The priority migration includes suspending execution of the time-critical thread at the source server, retrieving the records of the locations of the pages in memory referenced by the time-critical thread, and issuing a command to transfer content from the pages to the destination server. The embodiment also includes issuing a migration command to complete the migration.

Compiler-assisted inter-SIMD-group register sharing

Systems, apparatuses, and methods for efficiently sharing registers among threads are disclosed. A system includes at least a processor, control logic, and a register file with a plurality of registers. The processor assigns a base set of registers to each thread of a plurality of threads executing on the processor. When a given thread needs more than the base set of registers to execute a given phase of program code, the given thread executes an acquire instruction to acquire exclusive access to an extended set of registers from a shared resource pool. When the given thread no longer needs additional registers, the given thread executes a release instruction to release the extended set of registers back into the shared register pool for other threads to use. In one implementation, the compiler inserts acquire and release instructions into the program code based on a register liveness analysis performed during compilation.

Scheduling tasks using swap flags

A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.