G06F11/18

Error correction in a redundant processing system
11354203 · 2022-06-07 · ·

A processing system encompasses several processing devices and a comparison device. A method for controlling the processing system encompasses: processing of identical information items by the processing devices using associated processing processes; furnishing a characteristic value of each processing process, respectively as a function of the processing that has occurred; and comparing the characteristic values by way of the comparison device and determining a defectively operating processing process on the basis of the comparison. The defectively operating processing process is replaced by a processing process restarted on the same processing device.

Method, system and device to test a plurality of devices by comparing test results of test chains of the plurality of devices
11353508 · 2022-06-07 · ·

A method tests a plurality of devices, each device including a test chain having a plurality of positions storing test data. The testing includes comparing test data in a last position of the test chain of each of the devices. The test data in the test chains of the devices is shifted forward by one position. The shifting includes writing test data in the last position of a test chain to a first position in the test chain. The comparing and the shifting are repeated until the test data in the last position of each test chain when the testing is started is shifted back into the last position of the respective test chain. The plurality of devices may have a same structure and a same functionality.

MEDIATOR ASSISTED SWITCHOVER BETWEEN CLUSTERS

Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.

Apparatuses, methods, and systems for hardware-assisted lockstep of processor cores
11340960 · 2022-05-24 · ·

Systems, methods, and apparatuses relating to circuitry to implement lockstep of processor cores are described. In one embodiment, a hardware processor comprises a first processor core comprising a first control flow signature register and a first execution circuit, a second processor core comprising a second control flow signature register and a second execution circuit, and at least one signature circuit to perform a first state history compression operation on a first instruction that executes on the first execution circuit of the first processor core to produce a first result, store the first result in the first control flow signature register, perform a second state history compression operation on a second instruction that executes on the second execution circuit of the second processor core to produce a second result, and store the second result in the second control flow signature register.

Flexible byzantine fault tolerance

A method and system for performing a flexible Byzantine fault tolerant (BFT) protocol. The method includes sending, from a client device, a proposed value to a plurality of replica devices and receiving, from at least one of the plurality of replica devices, a safe vote on the proposed value. The replica device sends the safe vote, based on a first quorum being reached, to the client device and each of the other replica devices of the plurality of replica devices. The method further includes determining that a number of received safe votes for the proposed value meets or exceeds a second quorum threshold, selecting the proposed value based on the determination, and setting a period of time within which to receive additional votes. The method further includes, based on the period of time elapsing without receiving the additional votes, committing the selected value for the single view.

Performing remote part reseat actions

A tool for performing remote part reseat actions. Responsive to receiving a request for a scheduled operation, the tool generates an operation table in a push file. Responsive to a determination that there is at least one redundant component for the scheduled operation, the tool identifies the at least one redundant component. The tool determines one or more tolerable errors for the at least one redundant component. The tool appends the at least one redundant component and the one or more tolerable errors to the operation table in the push file. The tool schedules the push file to prescribe one or more recovery operations for the scheduled operation.

Node Failure Detection and Resolution in Distributed Databases
20220147426 · 2022-05-12 · ·

Methods and systems to detect and resolve failure in a distributed database system is described herein. A first node in the distributed database system can detect an interruption in communication with at least one other node in the distributed database system. This indicates a network failure. In response to detection of this failure, the first node starts a failure resolution protocol. This invokes coordinated broadcasts of respective lists of suspicious nodes among neighbor nodes. Each node compares its own list of suspicious nodes with its neighbors' lists of suspicious nodes to determine which nodes are still directly connected to each other. Each node determines the largest group of these directly connected nodes and whether or not it is in that group. If a node isn't in that group, it fails itself to resolve the network failure.

Multicore system for determining processor state abnormality based on a comparison with a separate checker processor
11327853 · 2022-05-10 · ·

A multicore system according to one or more embodiments is disclosed, which may include processors that execute processing different from each other, a selector that selects one of the processors, a checker processor, a comparator that compares an external state of the processor selected by the selector with an external state of the checker processor, or compares an internal state of the processor selected by the selector with an internal state of the checker processor, and a controller that determines that the selected processor or the checker processor is abnormal in response to the external states or the internal states not matching each other based on comparison results obtained by the comparator.

System and method for an adaptive election in semi-distributed environments

Systems, methods, and computer-readable storage media for receiving, at a central server from a first remote data transmission device, first product data for a product at a first location and second product data for the product from a second remote data transmission device at a second location. The respective data is processed sequentially, then determined to contain identical data, such that the system selects a data transmission device as the leader. Then, at a second time, the system receives receiving additional product data from only the selected data transmission device and not from the ignored transmission device, then processes the additional product data as though it had been received from both the first remote data transmission device and the second remote data transmission device.

CPU-GPU LOCKSTEP SYSTEM
20230251941 · 2023-08-10 · ·

A lockstep controller operates a lockstep system of three or more CPU-GPU pairs, comparing the outputs from the CPU-GPU pairs and, by way of a majority vote, provides the output for the lockstep system. Based on comparing the outputs, if one of the CPU-GPU pairs provides outputs that disagree with the majority outputs, it can be switched out of the lockstep system. The removed CPU is replaced by a backup CPU. So that the backup CPU can be part of a CPU-GPU pair, a portion of the address space from the GPU of one of the other CPU-GPU pairs is assigned to the backup CPU to operate as a replacement CPU-GPU pair, while the CPU already associated with this GPU retains another portion of the GPU's address space to continue operating as a CPU-GPU pair.