G06F11/181

System and method for n-modular redundant communication

A fault tolerant consensus generation and communication system and method is described. Each processing node in the system receives a plurality of measurements from a sensor, calculates a consolidated value for the received plurality of measurements, transmits the consolidated value to other processing nodes, receives consolidated values from the other processing nodes, calculates a consensus value based on the calculated consolidated value and the received one or more consolidated values, transmits the calculated consensus value to the other processing nodes, receives consensus values from the other processing nodes, generates a consensus message based on the calculated consensus value, the received one or more consensus values, and a predefined criterion, and, in a case where the consensus message is not present in a consensus queue, adds the consensus message to the consensus queue.

DATABASE MANAGEMENT SYSTEM WITH CODING CLUSTER AND METHODS FOR USE THEREWITH

A networked database management system (DBMS) is disclosed. In particular, the disclosed DBMS includes a plurality of nodes, one of which is elected as a designated leader. The designated leader is elected using a consensus algorithm, such as tabulated random votes, RAFT or PAXOS. The designated leader is responsible for managing open coding lines, and determining when to close an open coding line.

MANAGING NODES OF A DBMS

A tool for replacing a first database node of a database management system by a second database node. The tool receives an indication that the first database node received a data access request for accessing a database shared between the first database node and the second database node. The tool duplicates the data access request at the first database node. Responsive to a determination that the duplicated data access request includes a data changing statement and a previously executed statement, the tool modifies the duplicated data access request to prevent execution of the data changing statement and the previously executed statement at the second database node. The tool executes the modified duplicated data access request at the second database node. The tool replaces, based on a replacement condition being met, the first database node with the second database node in the database management system.

Mediator assisted switchover between clusters

Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.

Safety relay box system

A dual redundant computer safety relay box system includes first and second fail-safe computing systems (FSCs) individually mounted to first and second printed circuit boards. Each FSC includes two computing modules (CPUs) designated as a first CPU and a second CPU. The first and second FSC's are both connected to a safety relay box. The printed circuit boards are isolable from each other permitting maintenance on one of the printed circuit boards while operation of the FSC of the other printed circuit board is maintained. In each FSC a health signal generated from the first and second printed circuit boards of the first and second CPUs defines a multi-level dynamic pulse signal. Presence of the dynamic pulse signal produces an output identified as each of a first and a second healthy indication signal from each of the CPUs of one of the first or second FSCs.

Node failure detection and resolution in distributed databases
11500743 · 2022-11-15 · ·

Methods and systems to detect and resolve failure in a distributed database system is described herein. A first node in the distributed database system can detect an interruption in communication with at least one other node in the distributed database system. This indicates a network failure. In response to detection of this failure, the first node starts a failure resolution protocol. This invokes coordinated broadcasts of respective lists of suspicious nodes among neighbor nodes. Each node compares its own list of suspicious nodes with its neighbors' lists of suspicious nodes to determine which nodes are still directly connected to each other. Each node determines the largest group of these directly connected nodes and whether or not it is in that group. If a node isn't in that group, it fails itself to resolve the network failure.

REDUNDANT SYSTEM AND METHOD OF OPERATING A REDUNDANT SYSTEM
20220045809 · 2022-02-10 · ·

A redundant system for processing at least one signal is described wherein the redundant system has N+1 devices include N operational devices and one reserve device. The N operational devices and the reserve device are interconnected with each other. The redundant system includes a system control integrated within one of the devices of the redundant system. The redundant system further includes switches that are associated with the operational devices. In case of a failure of a respective operational device, the system control is configured to cause at least one of the devices to operate the switch associated with the respective operational device having the failure. Further, a method of operating a redundant system for processing at least one signal is described.

CONTINUING OPERATION OF A QUORUM BASED SYSTEM AFTER FAILURES

A processor-implemented method, for continuing operation of a quorum based system is provided. The method detects a loss of quorum. A plurality of speculative configurations is created, whereby each speculative configuration is isolated from other speculative configurations in the quorum based system. Each speculative configuration continues to order requests during the creation of speculative configurations. The method selects and starts one of the plurality of speculative configurations as a new operational configuration. Ordered requests continue to the new operational configuration. The original configuration of the quorum based system is restarted in response to the plurality of speculative configurations not being isolated.

SYSTEM AND METHOD FOR N-MODULAR REDUNDANT COMMUNICATION

A fault tolerant consensus generation and communication system and method is described. Each processing node in the system receives a plurality of measurements from a sensor, calculates a consolidated value for the received plurality of measurements, transmits the consolidated value to other processing nodes, receives consolidated values from the other processing nodes, calculates a consensus value based on the calculated consolidated value and the received one or more consolidated values, transmits the calculated consensus value to the other processing nodes, receives consensus values from the other processing nodes, generates a consensus message based on the calculated consensus value, the received one or more consensus values, and a predefined criterion, and, in a case where the consensus message is not present in a consensus queue, adds the consensus message to the consensus queue.

Scaling performance for large scale replica sets for a strongly consistent distributed system

A system and a method are disclosed that provides a data replication management technique for a distributed environment that eliminates a need to order members of a replica set. A node of a node cluster in the distributed system may be configured to send in parallel an IO request to each respective member of the replica set. Reponses are received from members of the replica set that indicate a completion status of the IO request at the replica set member sending the IO response. A request is sent to other nodes of the node cluster to remove a replica from the replica set based on an error response received from the replica. The replica that responded with the error response is removed from the replica set based on an agreement of nodes of the node cluster to remove the replica from the replica set.