G06F11/2025

Artificial intelligence-based redundancy management framework

Methods, apparatus, and processor-readable storage media for artificial intelligence-based redundancy management are provided herein. An example computer-implemented method includes obtaining telemetry data from one or more client devices within at least one system; predicting one or more hardware component failures in at least a portion of the one or more client devices within the at least one system by processing at least a portion of the telemetry data using a first set of one or more artificial intelligence techniques; determining, using a second set of one or more artificial intelligence techniques, one or more redundant hardware components for implementation in connection with the one or more predicted hardware component failures; and performing at least one automated action based at least in part on the one or more redundant hardware components.

Locality based quorums

Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes.

Method and system for generating latency aware workloads using resource devices in a resource device pool

A method for managing data includes obtaining, by a management module, a workload generation request, wherein the workload generation request specifies a plurality of resource devices, identifying available resource devices in a resource device pool based on the plurality of resource devices, performing a latency analysis on the available resource devices to obtain a plurality of resource device combinations and a total latency cost of each resource device combination, and selecting a resource device combination of the plurality of resource device combinations based on the total latency cost of each resource device combination, wherein the resource device combination comprises a second plurality of resource devices and wherein each of the second plurality of resource devices is one of the plurality of resource devices.

MEDIATOR ASSISTED SWITCHOVER BETWEEN CLUSTERS

Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.

LOCKSTEP PROCESSOR RECOVERY FOR VEHICLE APPLICATIONS
20230092343 · 2023-03-23 · ·

A fault tolerant processing environment wherein multiple processors are configured as worker nodes and redundant nodes, with a failed worker node replaced programmatically by a manager node. Each of the processing nodes may include a processor and memory associated with the processor and communicate with other processing nodes using a network. A manager node creates a message passing interface (MPI) communication group having worker nodes and redundant nodes, instructs the worker nodes to perform lockstep processing of tasks for an application, and monitors execution of the tasks. If a node fails, the manager node creates a replacement worker node from one of the redundant processing nodes and creates a new communications group. It then instructs those nodes in the new communications group to resume processing based on the application state and checkpoint backup data.

INFOTAINMENT DEVICE FOR VEHICLE AND METHOD FOR OPERATING SAME
20220342782 · 2022-10-27 · ·

Disclosed is a base unit including a video link hub electrically connected to a user interface device to transmit a signal, a first system-on-chip (SoC) configured to provide a first infotainment function, and a processor configured to determine whether the first SoC is operating abnormally. When a second SoC is powered on, the first SoC performs authentication with respect to the second SoC, and when the processor determines that the first SoC is operating normally, the first SoC generates a first execution signal for display of a composite infotainment function, obtained by combining the first infotainment function with a second infotainment function provided by the second SoC, on the user interface device, and transmits the first execution signal to the video link hub, and the processor controls the video link hub to transmit the first execution signal to the user interface device.

MANAGING FAILOVER REGION AVAILABILITY FOR IMPLEMENTING A FAILOVER SERVICE

The present disclosure generally relates to managing a failover service. The failover service can receive a list of regions and a list of rules that must be satisfied for a region to be considered available for failover. The failover service can then determine the regions that satisfy each rule of the list of rules and are available for failover. The failover service can then deliver this information to a client. The failover service can determine the regions that do not satisfy one or more of the rules from the list of rules and deliver this information to a client. The failover service can perform automatic remediation to the unavailable failover regions and client remediation to the unavailable failover regions.

EVENT-DRIVEN SYSTEM FAILOVER AND FAILBACK
20230083450 · 2023-03-16 ·

A system determines that a primary event processor, included in a primary data center, is associated with a failure. The primary event processor is included in the primary data center and configured to process first events stored in a main event store of the primary data center. The system identifies a secondary event processor, in a secondary data center, that is to process one or more first events based on the failure. The primary event processor and the secondary event processor are configured to process a same type of event. The system causes, based on a configuration associated with the primary or secondary event processor, the one or more first events to be retrieved from one of the main event store or a replica event store. The replica event store is included in the secondary data center and mirrors the main event store of the primary data center.

REMOTE TERMINAL UNIT PROCESSOR REDUNDANCY SYCHRONIZATION
20230125853 · 2023-04-27 ·

Redundancy synchronization of remote terminal unit (RTU) central processing units (CPUs) associated with an industrial operation includes queuing time-stamped events on a main RTU CPU for transfer to a standby RTU CPU as the time-stamped events are generated on the main RTU CPU (i.e., in real-time). The synchronized RTU CPUs further permit synchronization of logic states and synchronization of firmware upgrades. Synchronization activities occur on the same synchronization communications channel between redundant RTU CPUs.

MANAGING APPLICATIONS IN A CLUSTER

Approaches for managing applications in a cluster are described. In an example, a first agent may be executing on a first programmable network adapter card installed within a first computing node within a cluster. The first agent may isolate an application executing on the first computing node. Thereafter, the application may be managed by the second computing node.