G06F2209/521

Compare and swap functionality for key-value and object stores

Embodiments for providing compare and swap (CAS) functionality to key value storage to allow multi-threaded applications to share storage devices and synchronize multiple concurrent threads or processes. A key-value application programming interface (API) is modified to include a CAS API in addition to the standard Put and Get APIs. The CAS function uses a key, expected old value, and new value to compare and swap an existing key value only if its current value equals the expected old value. Hash values of the key value and expected old value may be used by the CAS function to improve performance and reduce bandwidth.

Novel RTOS/OS Architecture for Context Switching Without Disabling Interrupts
20210089481 · 2021-03-25 ·

The present invention is a novel RTOS/OS architecture that changes the fundamental way that context switching is performed. In all prior operating system implementations, context switching required disabling of interrupts. This opens the possibility that data can be lost. This novel approach consists of a context switching method in which interrupts are never disabled. Two implementations are presented. In the first implementation, the cost is a negligible amount of memory. In the second, the cost is only a minimal impact on the context switching time. This RTOS/OS architecture requires specialized hardware. Concretely, an advanced interrupt controller that supports nesting and tail chaining of prioritized interrupts is needed (e.g. the Nested Vectored Interrupt Controller (NVIC) found on many ARM processors). The novel RTOS/OS architecture redefines how task synchronization primitives such as semaphores and mutexes are released. Whereas previous architectures directly accessed internal structures, this architecture does so indirectly by saving information in shared buffers or setting flags, and then activating a low priority software interrupt that subsequently interprets this data and performs all context switching logic. The software interrupt must be set as the single lowest priority interrupt in the system.

Efficiently managing the interruption of user-level critical sections

Techniques for efficiently managing the interruption of user-level critical sections are provided. In certain embodiments, a physical CPU of a computer system can execute a critical section of a user-level thread of an application, where program code for the critical section is marked with CPU instruction(s) indicating that the critical section should be executed atomically. The physical CPU can detect, while executing the critical section, an event to be handled by an OS kernel of the computer system and upon detecting the event, revert changes performed within the critical section. The physical CPU can then invoke a trap handler of the OS kernel, and in response the OS kernel can invoke a user-level handler of the application with information including (1) the identity of the user-level thread, (2) an indication of the event, (3) the physical CPU state upon detecting the event, and (4) an indication that the user-level thread was interrupted while in the critical section.

Techniques for efficiently performing data reductions in parallel processing units

Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

Thread checkpoint table for computer processor

Examples of techniques for a thread checkpoint table for a computer processor are described herein. An aspect includes, based on detecting an early power-off warning (EPOW) signal, determine, based on a thread checkpoint table, whether a status of a thread of a processor indicates that the thread has begun a unit of atomic work. Another aspect includes, based on determining that the status of the thread of the processor indicates that the thread has begun the unit of atomic work, allowing the thread to continue execution of the unit of atomic work. Another aspect includes determining, based the status of the thread in the thread checkpoint table, that the thread has completed the unit of atomic work. Another aspect includes, based on determining that the thread has completed the unit of atomic work, suspending the thread.

TECHNIQUES FOR EFFICIENTLY PERFORMING DATA REDUCTIONS IN PARALLEL PROCESSING UNITS
20210019198 · 2021-01-21 ·

Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

Method and apparatus to manage flush of an atomic group of writes to persistent memory in response to an unexpected power loss

A group of cache lines in cache may be identified as cache lines not to be flushed to persistent memory until all cache line writes for the group of cache lines have been completed.

REPLACING PREEMPTIBLE RCU WITH AN AUGMENTED SRCU IMPLEMENTATION
20200409938 · 2020-12-31 ·

An augmented sleepable read-copy update implementation (PREEMPT_SRCU) combines elements of a tree-based sleepable read-copy update environment (Tree-SRCU) with elements of a preemptible read-copy update environment (Preemptible-RCU). The elements of Tree-SRCU may be used to manage PREEMPT_SRCU grace periods and handle PREEMPT_SRCU callbacks. The elements of Preemptible-RCU may be used to drive existing PREEMPT_SRCU grace periods to completion.

Simulation of exclusive instructions
10853071 · 2020-12-01 · ·

A method and apparatus for simulating target program code on a host data processing apparatus, the simulation mapping load-exclusive instructions in the target program code to load instructions, and mapping store-exclusive instructions in the target program code to compare-and-swap instructions.

Request of an MCS lock by guests

In example implementations, a method include receiving a request for a lock in a Mellor-Crummey Scott (MCS) lock protocol from a guest user that is context free (e.g., a process that does not bring a queue node). The lock determines that it contains a null value. The lock is granted to the guest user. A pi value is received from the guest user to store in the lock. The pi value notifies subsequent users that the guest user has the lock.