Patent classifications
G06F11/2041
SERVER SYSTEM AND METHOD OF MANAGING SERVER SYSTEM
A server system including a first server to execute first role, other server to execute at other role, spare server and management layer server. The management layer server is configured to allocate first group of users to access first server and other group of users to access other server, receive status information sent by first server and status information sent by other server, analyse status information to determine an operational status of first server and operational status of other server, update role of spare server to first role when operational status of first server indicates failed state and reallocate first group of users to the spare server, and update a role of another spare server to the other role when the operational status of the other server indicates a failed state and reallocate the other group of users to the other spare server.
Proactive cluster compute node migration at next checkpoint of cluster upon predicted node failure
While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes.
Network virtualization policy management system
Concepts and technologies are disclosed herein for providing a network virtualization policy management system. An event relating to a service can be detected. A first policy that defines allocation of hardware resources to host the virtual network functions can be obtained, as can a second policy that defines deployment of the virtual network functions to the hardware resources. The hardware resources can be allocated based upon the first policy and the virtual network functions can be deployed to the hardware resources based upon the second policy.
Systems and methods for enabling a highly available managed failover service
a computing system that receives and stores configuration information for the application in a data store. The configuration information comprises (1) identifiers for a plurality of cells of the application that include at least a primary cell and a secondary cell, (2) a defined state for each of the plurality of cells, (3) one or more dependencies for the application, and (4) a failover workflow defining actions to take in a failover event. The computing system receives an indication, from a customer, of a change in state of the primary cell or a request to initiate the failover event. The computing system updates, in the data store, the states for corresponding cells of the plurality of cells based on the failover workflow and updates, in the data store, the one or more dependencies for the application based on the failover workflow.
Information processing system and information processing method
There are provided an information processing system that operates virtual machines and storage controllers on a processor, and an information processing method executed by the information processing system. A storage controller group capable of taking over processing between the storage controllers arranged in different nodes is provided. The virtual machine is movable between the different nodes by deploy. The virtual machine and the storage controller that processes data input and output by the virtual machine are arranged in the same node. A combination of the virtual machines that cannot be arranged in the same node is defined by a restriction. A management unit arranges one of the virtual machines that cannot be arranged in the same node in the node in which the storage controller included in the storage controller group to which the storage controller used by the other virtual machine belongs is not arranged.
Arbitration method and related apparatus
This application discloses an arbitration method and a related apparatus. The method includes: detecting, by the first ABS apparatus, first status information of a service module in the first DC; when determining that a communication link between the first DC and the second DC is faulty, obtaining, by the first ABS apparatus, second status information, wherein the second status information is status information that is of a service module in the second DC and that is detected by the second ABS apparatus; and arbitrating, by the first ABS apparatus, a subsequent service providing capability of the first DC based on the first status information and the second status information.
System and method for automatically scaling a cluster based on metrics being monitored
In accordance with an embodiment, described herein is a system and method for use in a distributed computing environment, for automatically scaling a cluster based on metrics being monitored. A cluster that comprises a plurality of nodes or brokers and supports one or more colocated partitions across the nodes, can be associated with an exporter process and alert manager that monitors metrics associated with the cluster. Various metrics can be associated with user-configured alerts that trigger or otherwise indicate the cluster should be scaled. When a particular alert is raised, a callback handler associated with the cluster, for example an operator, can automatically bring up one or more new nodes, that are added to the cluster, and then reassign a selection of existing colocated partitions to the new nodes/brokers, such that computational load can be distributed within the newly-scaled cluster environment.
Locality based quorums
Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes.
Encryption for a distributed filesystem
A computing device comprising a frontend and a backend is operably coupled to a plurality of storage devices. The backend comprises a plurality of buckets. Each bucket is operable to build a failure-protected stipe that spans two or more of the plurality of the storage devices. The frontend is operable to encrypt data as it enters the plurality of storage devices and decrypt data as it leaves the plurality of storage devices.
Targeted repair of hardware components in a computing device
A method for targeted repair of a hardware component in a computing device that is part of a cloud computing system includes monitoring a plurality of hardware components in the computing device. At some point, a defective sub-component within the hardware component of the computing device is identified. In addition to the defective sub-component, the hardware component also includes at least one sub-component that is functioning properly and a spare component that can be used in place of the defective sub-component. The method also includes initiating a targeted repair action while the computing device is connected to the cloud computing system. The targeted repair action prevents the defective sub-component from being used by the computing device without preventing sub-components that are functioning properly from being used by the computing device. The targeted repair action causes the spare component to be used in place of the defective sub-component.