Patent classifications
G06F11/181
Fault tolerant distributed computing system based on dynamic reconfiguration
A fault tolerant distributed computing system includes a communication link and a plurality of nodes in electronic communication with one another by the communication link. Each node executes at least one node-specific application, includes a standby database that stores a standby copy corresponding to one of the node-specific applications executed by one of the remaining nodes that are part of the distributed computing system, and includes a spare computational capacity sufficient to execute at least one standby copy of one of the node-specific applications stored in the standby database. In response to determining a specific node is non-operational, the remaining nodes execute all the standby copies of the one or more node-specific applications that were previously executed by the specific node that is now non-operational.
Mediator assisted switchover between clusters
Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.
MULTIPROCESSOR SYSTEM
The present invention realizes a functional safety of a multiprocessor system without tightly coupling processor elements. When causing a plurality of processor elements to execute the same data processing and realizing a functional safety of the processor element, there is adopted a bus interface unit that performs control of performing safety measure processing when the non-coincidence of access requests issued from the processor elements has been fixed, and of starting access processing responding the access request when these access requests coincide with one another.
Node failure detection and resolution in distributed databases
Methods and systems to detect and resolve failure in a distributed database system is described herein. A first node in the distributed database system can detect an interruption in communication with at least one other node in the distributed database system. This indicates a network failure. In response to detection of this failure, the first node starts a failure resolution protocol. This invokes coordinated broadcasts of respective lists of suspicious nodes among neighbor nodes. Each node compares its own list of suspicious nodes with its neighbors' lists of suspicious nodes to determine which nodes are still directly connected to each other. Each node determines the largest group of these directly connected nodes and whether or not it is in that group. If a node isn't in that group, it fails itself to resolve the network failure.
APPARATUS AND METHOD FOR GRACEFUL DEGRADATION OF REDUNDANT PROCESSING
An apparatus and method for redundant data processing with graceful degrading functionality. For example, one embodiment of an apparatus comprises: three processing elements operable in a first redundancy mode, the three processing elements to execute a same sequence of instructions to produce three corresponding results; detection circuitry to detect when any one processing element of the three processing elements produces a different result from the other two processing elements of the three processing elements; tracking circuitry to associate an error with the one processing element when it produces the different result from the other two processing elements, wherein if an error threshold is reached for the one processing element, the other two processing elements are to operate in a second redundancy mode excluding the one processing element.
Parallel processing system runtime state reload
A parallel processing system includes at least three parallel processors, state monitoring circuitry, and state reload circuitry. The state monitoring circuitry couples to the at least three parallel processors and is configured to monitor runtime states of the at least three parallel processors and identify a first processor of the at least three parallel processors having at least one runtime state error. The state reload circuitry couples to the at least three parallel processors and is configured to select a second processor of the at least three parallel processors for state reload, access a runtime state of the second processor, and load the runtime state of the second processor into the first processor. Monitoring and reload may be performed only on sub-systems of the at least three parallel processors. During reload, clocks and supply voltages of the processors may be altered. The state reload may relate to sub-systems.
Database reversion with backup data structures
A system for database reversion is described. The system comprises: a database engine configured to host an active database; a log engine configured to generate transaction logs for transactions affecting the active database; a backup engine configured to create a backup data structure to allow for database reversion; and a memory buffer separate from the active database. A page in the active database has an associated page timestamp indicating a most recent update of the page in the active database. The database engine is configured to flush an updated copy of a page in the memory buffer to the active database. The backup engine is configured to, prior to the flush, store an image of the page in the active database to the backup data structure when the page in the active database is older than the time value related to the creation time of the backup data structure.
Consensus loss in distributed control systems
A device may correspond to a physical access controller in a distributed physical access control system. A method, performed by the device in a distributed system, may include detecting that another device in the distributed system has become unavailable; determining that a loss of consensus has occurred in the distributed system based on detecting that the other device has become unavailable; generating a list of available devices in the distributed system; and sending an alarm message to an administrative device, wherein the alarm message indicates the loss of consensus and wherein the alarm message includes the list of available devices.
Detection of hardware errors using periodically synchronized redundant transactions and comparing results from cores of a multi-core processor
A method for detecting errors in hardware including running a transaction on a plurality of cores, wherein each of the cores runs a respective copy of the transaction, periodically synchronizing the transaction on the cores throughout execution of the transaction, comparing results of the transaction on the cores, and determining an error in one or more of the cores.
METHOD FOR REDUNDANT PROCESSING OF DATA
A method for redundant processing of data by at least two processing units is described. After a restart or reset, the first processing unit of the at least two processing units receives first portions of the data for processing from at least one second processing unit of the at least two processing units.