G06F11/1474

Method and system for managing fault recovery in system-on-chips
11609821 · 2023-03-21 · ·

A fault recovery system including a fault controller is disclosed. The fault controller is coupled between a processor and an interconnect, and configured to receive a time-out signal that is indicative of a failure of the processor to execute a transaction after a fault is detected in the processor. The failure in the execution of the transaction results in queuing of the interconnect. Based on the time-out signal, the fault controller is further configured to generate and transmit a control signal to the processor to disconnect the processor from the interconnect. Further, the fault controller is configured to execute the transaction, and in turn, dequeue the interconnect. When the transaction is successfully executed, the fault controller is further configured to generate a status signal to reset the processor, thereby managing a fault recovery of the processor.

INTERPROCESS COMMUNICATION FOR ASYNCHRONOUS TASKS

Methods and systems are disclosed herein for managing software operations in a computer system. A software operation may include many tasks. The tasks may be grouped together based on the tasks' dependencies on output generated from other tasks. Each group of tasks may be placed in a block of a blockchain based on the dependencies. If the output of a block fails to pass a validation test, the tasks in each block may be undone in an organized order (e.g., starting with the most recently performed task and using the one or more rollback functions associated with each task), which may prevent problems that could occur when some asynchronous tasks complete and others fail. Use of the blockchain may allow the computer system to determine more precisely where an operation failed and may allow the computer system to determine more information about the failure.

Method and apparatus for reading and writing committed data

According to some aspects, a database system comprising a processor configured to execute a plurality of system components is provided. The plurality of system components may include an interface component configured to receive a write commit command and provide a write commit confirmation, a snapshot component configured to generate a plurality of snapshots of data stored in a data storage node of a plurality of data storage nodes and identify a committed snapshot representative of data that has been replicated on a majority of the plurality of data storage nodes, and a command processing component configured to modify a data element based on the write commit command, determine whether the majority of the plurality of storage nodes have replicated the modification using the committed snapshot, and generate the write commit confirmation responsive to a determination that the majority of the plurality of data storage nodes have replicated the modification.

Exception handling in transactions

An apparatus and a method of operating a data processing apparatus, and simulators thereof, are disclosed. Data processing circuitry performs data processing operations in response to instructions, where some sets of instructions may be defined as a transaction which are to be performed atomically with respect to other operations performed by the data processing circuitry. When a synchronous exception occurs during a transaction the transaction is aborted and an exception counter is incremented. When the counter reaches a threshold value a transaction failure signal is generated, allowing, if appropriate a response to this number of exceptions causing transaction aborts to be carried out.

Using persistent memory to enable restartability of bulk load transactions in cloud databases

Systems, methods, and computer-readable media for managing storing of data in a data storage system using a client tag. In some examples, a first portion of a data load as part of a transaction and a client identifier that uniquely identifies a client is received from the client at a data storage system. The transaction can be tagged with a client tag including the client identifier and the first portion of the data load can be stored in storage at the data storage system. A first log entry including the client tag is added to a data storage log in response to storing the first portion of the data load in the storage. The first log entry is then written from the data storage log to a persistent storage log in persistent memory which is used to track progress of storing the data load in the storage.

Handling failures when synchronizing objects during a write operation

A method for execution by a dispersed storage network (DSN). The method begins by obtaining a data object for synchronized storage within a plurality of storage vaults, identifying a plurality of storage vaults, encoding the data object for each storage vault, initiating storage of data slices for each storage vault and interpreting received data slice information from at least some of the storage vaults to determine a number of storage vaults that have successfully stored the corresponding plurality of sets of encoded data slices and when the vault threshold number of storage vaults have not successfully stored the corresponding plurality of sets of encoded data slices within a synchronization timeframe, initiating a rollback process to abandon storage of the data object in the plurality of storage vaults and a store data response to indicate unsuccessful synchronized storage of the data object in the plurality of storage vaults.

TRANSACTION EXCHANGE PLATFORM WITH WATCHDOG MICROSERVICE
20230125639 · 2023-04-27 ·

Aspects described herein may relate to a transaction exchange platform using a streaming data platform (SDP) and microservices to process transactions according to review and approval workflows. The transaction exchange platform may receive transactions from origination sources, which may be added to the SDP as transaction objects. Microservices on the transaction exchange platform may interact with the transaction objects based on configured workflows associated with the transactions. Processing on the transaction exchange platform may facilitate clearing and settlement of transactions. Some aspects may provide for dynamic and flexible reconfiguration of workflows and/or microservices. Other aspects may provide for data snapshots and workflow tracking, allowing for monitoring, quality control, and auditability of transactions on the transaction exchange platform.

ERROR RECOVERY FOR NON-VOLATILE MEMORY MODULES

A memory controller includes a command queue, a memory interface queue, at least one storage queue, and a replay control circuit. The command queue has a first input for receiving memory access commands. The memory interface queue receives commands selected from the command queue and couples to a heterogeneous memory channel which is coupled to at least one non-volatile storage class memory (SCM) module. The at least one storage queue stores memory access commands that are placed in the memory interface queue. The replay control circuit detects that an error has occurred requiring a recovery sequence, and in response to the error, initiates the recovery sequence. In the recovery sequence, the replay control circuit transmits selected memory access commands from the at least one storage queue by grouping non-volatile read commands together separately from all pending volatile reads, volatile writes, and non-volatile writes.

Snapshot isolation in a distributed storage system
11474977 · 2022-10-18 · ·

Systems and methods for snapshot isolation in a distributed storage system M are provided. In some examples, a method can include receiving a request for data from a data store comprising a plurality of storage nodes, the request comprising a key associated with the data and a request timestamp, identifying a storage node in the plurality of storage nodes from which to read the data based on the key and a mapping of keys to storage nodes, determining that the request is associated with a transaction comprising operations associated with at least one other storage node in the plurality of storage nodes, determining a status of the entry for the key in the key-value table based on a lookup of a transaction table using the transaction identifier, and providing the data when the status of the entry for the key indicates the data is ready to be read.

Error recovery from submission queue fetching errors
11599481 · 2023-03-07 · ·

An apparatus includes a non-volatile memory media and a storage controller. The storage controller is configured to fetch a storage command from a submission queue of a host. The submission queue has a submission queue identifier (SQID). The storage controller then determines a submission queue fetch error in response to receiving a Transport Layer Packet (TLP) error as a result of fetching the storage command. Next, the storage controller is configured to determine a command identifier (CID) for the storage command associated with the submission queue fetch error. The storage controller then sends a completion message to the host. The completion message uniquely identifies the storage command of the submission queue associated with the submission queue fetch error using the SQID and CID.