Patent classifications
G06F11/2023
Ceph Failure and Verification
A media failure server includes a processor, and a non-transitory machine-readable medium including instructions. The instructions, when loaded and executed by the processor, cause the processor to aggregate software defined storage (SDS) performance data from a plurality of media servers, process the aggregated SDS performance data, and determine whether the aggregate SDS performance data indicates that a first media server includes a potentially failing storage medium.
Linear view-change BFT
Techniques for implementing linear view-change in a Byzantine Fault Tolerant (BFT) protocol running on a distributed system comprising n replicas are provided. According to one set of embodiments, at a time of performing a view-change from a current view number v to a new view number v+1, a replica in the n replicas corresponding to a new proposer for new view number v+1 can generate a PREPARE message comprising a single COMMIT certificate, where the single COMMIT certificate is the highest COMMIT certificate the new proposer is aware of. The new proposer can then transmit the PREPARE message with the single COMMIT certificate to all other replicas in the n replicas.
Managing data center failure events
Managing data center recovery from failure events can include a failure event platform having aspects provided via a user interface that integrates multiple failure and recovery management and execution features. The features can include, among others, application drift monitoring between production and recovery environments, real-time health checks of system components, user-modifiable scripting for prioritizing and customizing data center recovery actions, and a recovery execution tool.
Systems and methods for online brand continuity
The present disclosure provides a system and method for online brand continuity. Online brand continuity can include a number of Internet or intranet access points via which one or more network addresses can be advertised. A client can be provided with availability of a business image application via at least one of the Internet or intranet access points.
Digital twin of IT infrastructure
A digital twin of an IT infrastructure is created to identify a group of critical servers (called “base servers”) needed to replicate the IT infrastructure in a cloud-computing environment. To identify the correct base servers and their actual server configurations, the IT infrastructure is crawled and various telemetry, connection, and network data is analyzed against data sets of other known servers. The digital twin is created to include these base servers and their particular configurations. Then, the digital twin may be deployed on demand in the cloud-computing environment using executable scripts that mimic the base servers and their particular configurations, creating a replication of the IT infrastructure for various purposes (e.g., redundancy, testing, etc.).
Automated media agent state management
Described herein are techniques for automating media agent state management. For example, if a media agent is running poorly, then the media agent can be disabled and an alternate media agent can perform secondary copy job operations in place of the poorly running media agent. To determine whether a media agent is running poorly, a storage manager can determine whether the media agent has an anomalous number of failed jobs, pending jobs, and/or long running jobs and/or can determine whether the amount of resources used by the media agent is high or is increasing constantly, at a constant rate, or at a near constant rate.
CONNECTION REESTABLISHMENT PROTOCOL FOR PEER COMMUNICATION IN DISTRIBUTED SYSTEMS
Communication resumption information can be retained nodes of a cluster of nodes that form a distributed computing system. The communication resumption information can be exchanged between a node of the cluster and a peer node of the cluster after resumption of communication following a loss of communication between the node and the peer node. A determination of whether communication between the node and the peer node can be reestablished without losing messages can include comparing the communication resumption information received by the node from the peer node with the communication resumption information retained at the node. Communication between the node and the peer node can be resumed based when the determining indicates that communication between the node and the peer node can be reestablished without losing messages.
METHODS AND SYSTEMS OF MANAGING DELETES IN A DATABASE NODE OF A NOSQL DATABASE
In one aspect, a computerized method for managing tombstones in a node of a Distributed Database Base System (DDBS) includes the step of providing a rule that, for a namespace in a record of the node of the DDBS that allows expiration, mandates that a later generation's expiration time of the namespace never decreases. The computerized method includes the step of determining that an administrator of the DDBS has set an expiration time of the namespace to infinity. The computerized method includes the step of implementing a background process of the DDBS, wherein the background process scans a DDBS node's drive and flags a set of extant tombstones that are no longer covering viable namespaces or viable records. The computerized method includes the step of deleting all the flagged tombstones.
LIFECYCLE MANAGEMENT OF VIRTUAL INFRASTRUCTURE MANAGEMENT SERVER APPLIANCE
A method of upgrading a VIM server appliance includes: creating a snapshot of logical volumes mapped to physical volumes that store configuration and database files of virtual infrastructure management (VIM) services provided by a first VIM server appliance to be upgraded; after the snapshot is created, expanding the configuration and database files to be compatible with a second VIM server appliance; replicating the logical volumes which have been modified as a result of expanding the configuration and database files, in the second VIM server appliance; after replication, performing a switchover of VIM services that are provided, from the first VIM server appliance to the second VIM server appliance; and upon failure of any of the steps of expanding, replicating, and performing the switchover, aborting the upgrade, and reverting to a version of the configuration and database files that was preserved by creating the snapshot.
Selecting a witness service when implementing a recovery plan
Methods, systems, and computer program products for selection of a witness during virtualization system recovery after a disaster event. A recovery plan is configured to identify a witness that is then used to elect a leader to implement the recovery. Various system, and/or network, and/or component failures and/or various loss of function of components of the virtualization system can trigger initiation of the recovery plan. Based on the particular recovery plan that is invoked upon a determination of a network outage, or component failure or loss of function of a component of the virtualization system, a particular witness corresponding to a subset of entities of the particular recovery plan is selected. The witness is used to elect the leader, and the leader initiates actions of the recovery plan. The implementation of the recovery plan includes consideration of the health of components that would potentially be involved in the recovery actions.