G06F2209/521

Request of an MCS lock by guests

In example implementations, a method include receiving a request for a lock in a Mellor-Crummey Scott (MCS) lock protocol from a guest user that is context free (e.g., a process that does not bring a queue node). The lock determines that it contains a null value. The lock is granted to the guest user. A pi value is received from the guest user to store in the lock. The pi value notifies subsequent users that the guest user has the lock.

Method and apparatus for secure and verifiable composite service execution and fault management on blockchain

A method is implemented by one or more network devices to identify an originating point of failure in a composite service executed in a cloud computing environment. The execution of the composite service includes execution of a plurality of atomic services in an ordered sequence, where for each atomic service that is executed, an execution trace for that atomic service is stored in a blockchain to form an ordered sequence of execution traces, where the execution trace for a given atomic service is signed using the private key associated with that atomic service and analyzing one or more of the ordered sequence of execution traces to determine which of the plurality of atomic services originated the failure, where each execution trace that is analyzed is authenticated using the public key that corresponds to the private key associated with the atomic service that generated that execution trace.

Memory system, operation method thereof, and database system including the memory system
11782840 · 2023-10-10 · ·

A method for operating a multi-transaction memory system, the method includes: storing Logical Block Address (LBA) information changed in response to a request from a host and a transaction identification (ID) of the request into one page of a memory block; and performing a transaction commit in response to a transaction commit request including the transaction ID from the host, wherein the performing of the transaction commit includes: changing a valid block bitmap in a controller of the multi-transaction memory system based on the LBA information.

HARDWARE ACCELERATED SYNCHRONIZATION WITH ASYNCHRONOUS TRANSACTION SUPPORT

A new transaction barrier synchronization primitive enables executing threads and asynchronous transactions to synchronize across parallel processors. The asynchronous transactions may include transactions resulting from, for example, hardware data movement units such as direct memory units, etc. A hardware synchronization circuit may provide for the synchronization primitive to be stored in a cache memory so that barrier operations may be accelerated by the circuit. A new wait mechanism reduces software overhead associated with waiting on a barrier.

Barrierless and fenceless shared memory synchronization with write flag toggling
11620169 · 2023-04-04 · ·

When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is simultaneously stored in the memory location along with the value is toggled. The consumer thread tracks information to determine whether the flag stored in the location indicates whether the producer has written the value to the location. The flag is read and written simultaneously with reading and writing the location in memory, thereby eliminating the need for a memory fence. After all of the consumer threads read the value, the location may be reused to write additional value(s) and simultaneously toggle the flag.

METHOD AND SYSTEM OF A HIERARCHICAL TASK SCHEDULER FOR A MULTI-THREAD SYSTEM
20220214925 · 2022-07-07 ·

A method for scheduling tasks from a program executed by a multi-processor core system is disclosed. The method includes a scheduler that groups a plurality of tasks, each having an assigned priority, by priority in a task group. The task group is assembled with other task groups having identical priorities in a task group queue. A hierarchy of task group queues is established based on priority levels of the assigned tasks. Task groups are assigned to one of a plurality of worker threads based on the hierarchy of task group queues. Each of the worker threads is associated with a processor in the multi-processor system. The tasks of the task groups are executed via the worker threads according to the order in the hierarchy.

Replacing preemptible RCU with an augmented SRCU implementation

An augmented sleepable read-copy update implementation (PREEMPT_SRCU) combines elements of a tree-based sleepable read-copy update environment (Tree-SRCU) with elements of a preemptible read-copy update environment (Preemptible-RCU). The elements of Tree-SRCU may be used to manage PREEMPT_SRCU grace periods and handle PREEMPT_SRCU callbacks. The elements of Preemptible-RCU may be used to drive existing PREEMPT_SRCU grace periods to completion.

PERFORMANCE MODELING OF GRAPH PROCESSING COMPUTING ARCHITECTURES

A distributed simulation system is provided that includes a timing simulator and functional simulator(s) on different computing nodes to simulate a graph processing system. The functional simulators are to simulate execution of a set of instructions on the graph processing system and to send information associated with the simulated set of instructions to the timing simulator over the network. The timing simulator is to determine timing information associated with execution of the sets of instructions sent by the functional simulators and send the timing information to the functional simulators over the network. The timing simulator may determine a global synchronization point for the functional simulators and send the timing information for the sets of instructions to respective functional simulators at the global synchronization point. The functional simulators may stall simulation of further instructions until the timing information for its set of instructions is received from the timing simulator.

LOW LATENCY AND HIGHLY PROGRAMMABLE INTERRUPT CONTROLLER UNIT

A graph processing core includes a plurality of processing pipelines and an interrupt controller unit. Each processing pipeline executes one or more threads and includes, for each thread, a register indicating a currently executing program counter vector and another register indicating an interrupt or exception handler vector. The interrupt controller unit may receive interrupt or exception notifications from the processing pipelines, determine a handler vector based on the notification and a set of registers of the interrupt controller unit, and transmit the handler vector to the processing pipeline that issued the interrupt or exception notification. Further, the issuing pipeline may receive the handler vector from the interrupt controller unit, write a value in the first register into the second register, write the handler vector into the first register, and invoke an interrupt or exception hander based on the value written into the first register.

PARALLEL MEMORY MODEL FOR DISTRIBUTED FUNCTIONAL SIMULATIONS

A distributed simulation system is provided that includes a plurality of computing nodes interconnected via a network implementing a Message Passing Interface (MPI) protocol. Each computing node is to simulate hardware logic of a core of a graph processing system and to simulate a respective system memory portion of the graph processing system.