Patent classifications
G06F11/1479
Failover management for batch jobs
Computer-implemented methods, computer program products, and computer systems are provided. A method includes generating a running result matrix for a plurality of batch jobs, indicating corresponding running results for respective processing actions in batch jobs of the plurality of batch jobs. The method further includes obtaining an internal dependency matrix for the plurality of batch jobs, indicating corresponding dependencies between respective processing actions within a batch job of the plurality of batch jobs. The method further includes calculating a recovery matrix for the plurality of batch jobs based, at least in part, on the running result matrix and the internal dependency matrix, the recovery matrix indicating corresponding recovery actions for respective processing actions in batch jobs of the plurality of batch jobs. The method further includes executing failover management for one or more batch jobs based, at least in part, on the calculated recovery matrix.
Live migration of clusters in containerized environments
The technology provides for live migration from a first cluster to a second cluster. For instance, when requests to one or more cluster control planes are received, a predetermined fraction of the received requests may be allocated to a control plane of the second cluster, while a remaining fraction of the received requests may be allocated to a control plane of the first cluster. The predetermined fraction of requests are handled using the control plane of the second cluster. While handling the predetermined fraction of requests, it is detected whether there are failures in the second cluster. Based on not detecting failures in the second cluster, the predetermined fraction of requests allocated to the control plane of the second cluster may be increased in predetermined stages until all requests are allocated to the control plane of the second cluster.
Intelligently adaptive log level management of a service mesh
Systems, methods and/or computer program products dynamically managing log levels of microservices in a service mesh based on predicted error rates of calls made to the service mesh. A first AI module predicts health, status and/or failures of microservices individually or as part of microservice chains with a particular confidence level. Using health status mapped to the microservices and historical information inputted into a knowledge base (including error rates), the first AI module predicts error rates of the API call for each user profile or generally by the service mesh. A second AI module analyzes the predictions provided by the first AI module and determines whether the predictions meet threshold levels of confidence. To improve the confidence of predictions that are below threshold levels, the second AI module dynamically adjusts application logs of the microservices and/or proxies thereof to an appropriate level to capture more detailed information within the logs.
Preemptible-based scaffold hopping
In a method of molecular scaffold hopping an interface of a scheduler computer sends instructions, prepared by the scheduler computer, to a job runner computer to perform a plurality of separate computational tasks. Each of the separate computational tasks includes calculating one or more chemical properties for a query molecule or molecules in a library of molecules. One or more of the plurality of separate computational tasks performed on the job runner computer are preemptible computing instances. Status indicators sent from the job runner computer are received by the interface for each of the plurality of separate computational tasks. The indicators are one of: incomplete, completed, or failed computing instances. The interface resends the instructions to the job runner computer that correspond to the separate computational tasks having the failed computing instance indicator to increase fault-tolerance against the separate computational tasks not attaining the completed computing instance indicator.
Error recovery in digital communications
Electronic communications between a client device and a server device are improved by providing a middleware component that incorporates electronic data read and/or written to a database in a hybrid data structure. The hybrid structure is further designed to allow for “NULL” or other pre-defined data values when one or more data fields are unavailable or erroneous. The client device, in turn, can be configured to check for the pre-defined data values in certain fields and to gracefully process such values. The hybrid structure with pre-defined error values therefore provides for very efficient data transmittal and processing, while retaining the ability to handle errors or other unusual situations relating to the data.
High Availability and Software Upgrades in Network Software
Ensuring the high availability of a Passive Optical Network (PON). A broadband network architecture comprises (a) at least a portion of optical fiber in a communication path to individual subscriber premises, (b) one or more software-implemented Optical Line Terminal (OLT) Controllers, (c) one or more software-implemented Service Provisioning Applications (SPAs), and (d) one or more software-implemented Broadband Network Gateways (BNGs). Each of the one or more OLT Controllers, one or more SPAs, and one or more BNGs execute on Commercial Off-the-Shelf (COTS) computer systems and entirely upon a plurality of protection groups. Each of the plurality of protection groups consists of a plurality of pods. The pods in a particular protection group which are active are dynamically adjusted to ensure the high availability of the broadband network architecture.
Live migrating virtual machines to a target host upon fatal memory errors
The disclosed technology provides techniques, systems, and apparatus for containing and recovering from uncorrectable memory errors in distributed computing environment through migration of virtual machines and associated memory to a target host machine. An aspect of the disclosed technology includes a hypervisor or virtual machine manager that receives signaling of an uncorrectable memory error detected by a host machine. The virtual machine manager then uses information received via the signaling to identify virtual memory addresses or memory pages associated with the corrupted memory element so as to allow for containment and recovery from the error, and for live migration of the virtual machine.
APPLICATION LAUNCH SUPPORT
A method of software launch regression testing comprises monitoring an operational parameter of an existing application running on a plurality of client devices and determining a probability interval from the operational parameter of the existing application. A candidate update application is then launched to a subset of the plurality of client devices. The method then proceeds with monitoring a corresponding operational parameter of the candidate update version running on the subset of client devices, determining if the corresponding operational parameter of the candidate update version falls within the probability interval, and, based on the probability interval falling within the probability interval, providing a testing pass notification.
Techniques for LIF placement in SAN storage cluster synchronous disaster recovery
Improved techniques for disaster recover within storage area networks are disclosed. Embodiments include replicating a LIF of a primary cluster on a secondary cluster. LIF configuration information is extracted from the primary cluster. A peer node from a secondary cluster is located. One or more ports are located on the located peer node that match a connectivity of the LIF from the primary cluster. One or more ports are identified based upon one or more filtering criteria to generate a candidate port list. A port from the candidate port list is selected based at least upon a load of the port. Other embodiments are described and claimed.
PREEMPTIBLE-BASED SCAFFOLD HOPPING
In a method of molecular scaffold hopping an interface of a scheduler computer sends instructions, prepared by the scheduler computer, to a job runner computer to perform a plurality of separate computational tasks. Each of the separate computational tasks includes calculating one or more chemical properties for a query molecule or molecules in a library of molecules. One or more of the plurality of separate computational tasks performed on the job runner computer are preemptible computing instances. Status indicators sent from the job runner computer are received by the interface for each of the plurality of separate computational tasks. The indicators are one of: incomplete, completed, or failed computing instances. The interface resends the instructions to the job runner computer that correspond to the separate computational tasks having the failed computing instance indicator to increase fault-tolerance against the separate computational tasks not attaining the completed computing instance indicator.