Patent classifications
G06F11/1479
Database replication error recovery based on supervised learning
System and methods are described for automated recovery from errors occurring during replication of a database. The method includes getting text from one or more log files generated during database replication processing in a cloud computing environment, transforming the text into a structured language form represented by vectors, and identifying patterns in the vectors. The method further includes classifying one or more errors based on the identified patterns using supervised learning as either a recoverable error or an unrecoverable error, analyzing the one or more errors to determine one or more recovery jobs associated with database replication processing in the cloud computing environment for each of the recoverable errors, and invoking the one or more recovery jobs.
Dynamic hardware resource shadowing and memory error protection
Embodiments of the present disclosure are directed to dynamic shadow operations configured to dynamically shadow data-plane resources in a network device. In some embodiments, the dynamic resource shadow operations are used to locally maintain a shadow copy of data plane resources to avoid having to read them through a bus interconnect. In other embodiments, the dynamic shadow framework is used to provide memory protection for hardware resources against SEU failures. The dynamic shadow framework may operate in conjunction with adaptive memory scrubbing operations. In other embodiments, the dynamic shadow infrastructure is used to facilitate fast boot-up and fast upgrade operations.
TECHNIQUES FOR LIF PLACEMENT IN SAN STORAGE CLUSTER SYNCHRONOUS DISASTER RECOVERY
Improved techniques for disaster recover within storage area networks are disclosed. Embodiments include replicating a LIF of a primary cluster on a secondary cluster. LIF configuration information is extracted from the primary cluster. A peer node from a secondary cluster is located. One or more ports are located on the located peer node that match a connectivity of the LIF from the primary cluster. One or more ports are identified based upon one or more filtering criteria to generate a candidate port list. A port from the candidate port list is selected based at least upon a load of the port. Other embodiments are described and claimed.
Method for controlling and automatically restarting a technical apparatus
The invention is part of the field of computer technology. It describes the architecture of a secure automation system and a method for safe autonomous operation of a technical apparatus, in particular a motor vehicle. The architecture disclosed herein solves the problem that any Byzantine error in one of the complex subsystems of a distributed real-time computer system, regardless of whether the error was triggered by a random hardware failure, a design error in the software or an intrusion, must be recognized and controlled in such a way that no security-relevant incident occurs. The architecture includes four largely independent subsystems which are arranged hierarchically and each form an isolated Fault-Containment Unit (FCU). At the top of the hierarchy is a secure subsystem, which executes simple software on fault-tolerant hardware. The other three subsystems are insecure because they contain complex software executed on non-fault-tolerant hardware.
Live Migrating Virtual Machines to a Target Host Upon Fatal Memory Errors
The disclosed technology provides techniques, systems, and apparatus for containing and recovering from uncorrectable memory errors in distributed computing environment through migration of virtual machines and associated memory to a target host machine. An aspect of the disclosed technology includes a hypervisor or virtual machine manager that receives signaling of an uncorrectable memory error detected by a host machine. The virtual machine manager then uses information received via the signaling to identify virtual memory addresses or memory pages associated with the corrupted memory element so as to allow for containment and recovery from the error, and for live migration of the virtual machine.
FAILOVER MANAGEMENT FOR BATCH JOBS
Computer-implemented methods, computer program products, and computer systems are provided. A method includes generating a running result matrix for a plurality of batch jobs, indicating corresponding running results for respective processing actions in batch jobs of the plurality of batch jobs. The method further includes obtaining an internal dependency matrix for the plurality of batch jobs, indicating corresponding dependencies between respective processing actions within a batch job of the plurality of batch jobs. The method further includes calculating a recovery matrix for the plurality of batch jobs based, at least in part, on the running result matrix and the internal dependency matrix, the recovery matrix indicating corresponding recovery actions for respective processing actions in batch jobs of the plurality of batch jobs. The method further includes executing failover management for one or more batch jobs based, at least in part, on the calculated recovery matrix.
Workflow error handling for device driven management
Disclosed are various embodiments for workflow error handling for device driven management. A workflow can be received from a management service by a management agent. The workflow can define a sequence of actions to be implemented by the management agent on a client device and a set of error conditions associated with individual actions in the sequence of actions. The management agent can then process the individual actions in the sequence of actions defined by the workflow. Subsequently, the management agent can monitor the individual actions to determine whether the individual actions trigger an error condition in the set of error conditions. Finally, in response to a determination that the individual actions triggered the error condition in the set of error conditions, the management agent can perform an error response specified by the workflow.
Live Migration Of Clusters In Containerized Environments
The technology provides for live migration from a first cluster to a second cluster. For instance, when requests to one or more cluster control planes are received, a predetermined fraction of the received requests may be allocated to a control plane of the second cluster, while a remaining fraction of the received requests may be allocated to a control plane of the first cluster. The predetermined fraction of requests are handled using the control plane of the second cluster. While handling the predetermined fraction of requests, it is detected whether there are failures in the second cluster. Based on not detecting failures in the second cluster, the predetermined fraction of requests allocated to the control plane of the second cluster may be increased in predetermined stages until all requests are allocated to the control plane of the second cluster.
Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store
A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata request can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.
APPARATUS AND METHODS FOR ERROR DETECTION CODING
A first error-detecting code (EDC) is computed based on a first segment of a block of information that is to be encoded, and a second EDC is computed based on at least a second segment of the block of information. The first EDC is masked with a first masking segment and the second EDC with a second masking segment to generate a first masked EDC and a second masked EDC. The first masking segment and the second masking segment are associated with a target receiver of the block of information. A codeword is generated based on a code and an input vector that includes the first segment, the first masked EDC, the second segment, and the second masked EDC. This type of coding could be useful to support early termination of blind detection at a decoder, for example.