G06F11/1482

Automated query retry in a database system

Techniques for automated query retry in a database platform include assigning by at least one hardware processor a first execution of a query directed to database data to a first execution node of a plurality of execution nodes of an execution platform. The first execution node uses a first set of configurations during the first execution. The techniques further include determining that the first execution of the query by the first execution node results in a failed execution. The query is transferred to a second execution node of the plurality of execution nodes. A second execution of the query at the second execution node is caused. The second execution node uses a second set of configurations during the second execution. A cause of the failed execution at the first execution node is determined based on a result of the second execution of the query at the second execution node.

Distributed application orchestration management in a heterogeneous distributed computing environment

Distributed application orchestration management is provided. A first passive member of a set of passive members sends a notification message to other members indicating that the first passive member is initiating start of a distributed application in response to the first passive member validating that a self-restart by a leader member failed. The first passive member compares timestamps associated with an attempt to start the distributed application by other passive members in the set of passive members. The first passive member stops a particular attempt to start the distributed application in response to the first passive member determining that a timestamp associated with the particular attempt to start the distributed application by the first passive member is newer than another timestamp of another passive member. The first passive member designates the other passive member having an older timestamp as a new leader member to continue starting the distributed application.

AUTOMATED OPERATIONS MANAGEMENT FOR COMPUTER SYSTEMS
20230118222 · 2023-04-20 ·

Techniques are disclosed relating to automated operations management. In various embodiments, a computer system accesses operational information that defines commands for an operational scenario and accesses blueprints that describe operational entities in a target computer environment related to the operational scenario. The computer system implements the operational scenario for the target computer environment. The implementing may include executing a hierarchy of controller modules that include an orchestrator controller module at top level of the hierarchy that is executable to carry out the commands by issuing instructions to controller modules at a next level. The controller modules may be executable to manage the operational entities according to the blueprints to complete the operational scenario. In various embodiments, the computer system includes additional features such as an application programming interface (API), a remote routing engine, a workflow engine, a reasoning engine, a security engine, and a testing engine.

Reducing recovery time of an application

Examples provided herein describe a method for reducing recovery time for an application. For example, a first physical processor of a computing device may monitor, based on a first application instance of the application running in a first mode, for failure detection of the first application instance running on a first computing device. The first physical processor may determine that the first application instance is to be changed from the first mode to a second mode. Based on the determination, the first physical processor may validate that a second application instance can run in the first mode by performing a data integrity compliance check. Responsive to validating that the second application instance can run in the first mode, the first physical processor may facilitate running of the second application instance in the first mode.

System and method for hybrid kernel and user-space checkpointing using a character device
11656954 · 2023-05-23 · ·

A system, method, and computer readable medium for hybrid kernel-mode and user-mode checkpointing of multi-process applications. The computer readable medium includes computer-executable instructions for execution by a processing system. A multi-process application runs on primary hosts and is checkpointed by a checkpointer comprised of a kernel-mode checkpointer module and one or more user-space interceptors providing barrier synchronization, checkpointing thread, resource flushing, and an application virtualization space. Checkpoints may be written to storage and the application restored from said stored checkpoint at a later time. Checkpointing is transparent to the application and requires no modification to the application, operating system, networking stack or libraries. In an alternate embodiment the kernel-mode checkpointer is built into the kernel.

WORKFLOWS FOR AUTOMATED OPERATIONS MANAGEMENT
20230073909 · 2023-03-09 ·

Techniques are disclosed relating to automated operations management. In various embodiments, a computer system accesses operational information that defines commands for an operational scenario and accesses blueprints that describe operational entities in a target computer environment related to the operational scenario. The computer system implements the operational scenario for the target computer environment. The implementing may include executing a hierarchy of controller modules that include an orchestrator controller module at top level of the hierarchy that is executable to carry out the commands by issuing instructions to controller modules at a next level. The controller modules may be executable to manage the operational entities according to the blueprints to complete the operational scenario. In various embodiments, the computer system includes additional features such as an application programming interface (API), a remote routing engine, a workflow engine, a reasoning engine, a security engine, and a testing engine.

System and method for hybrid kernel- and user-space incremental and full checkpointing

A system includes a multi-process application that runs on primary hosts and is checkpointed by a checkpointer comprised of a kernel-mode checkpointer module and one or more user-space interceptors providing at least one of barrier synchronization, checkpointing thread, resource flushing, and an application virtualization space. Checkpoints may be written to storage and the application restored from said stored checkpoint at a later time. Checkpointing may be incremental using Page Table Entry (PTE) pages and Virtual Memory Areas (VMA) information. Checkpointing is transparent to the application and requires no modification to the application, operating system, networking stack or libraries. In an alternate embodiment the kernel-mode checkpointer is built into the kernel.

Auto-recovery job scheduling framework

The present disclosure relates to computer-implemented methods, software, and systems for an automatic recovery job execution through a scheduling framework in a cloud environment. One or more recovery jobs are scheduled to be performed periodically for one or more registered service components included in a service instance running on a cluster node of a cloud platform. Each recovery job is associated with a corresponding service component of the service instance. A health check operation is invoked at a service component based on executing a recovery job at the scheduling framework corresponding to the service component. In response to determining that the service component needs a recovery measure based on a result from the health check operation, a recovery operation is invoked as part of executing a set of scheduled routines of the recovery job. Implemented logic for the recovery operation is stored and executed at the service component.

Method and system for providing coordinated checkpointing to a group of independent computer applications

A system and method thereof for performing loss-less migration of an application group. In an exemplary embodiment, the system may include a high-availability services module structured for execution in conjunction with an operating system, and one or more computer nodes of a distributed system upon which at least one independent application can be executed upon. The high-availability services module may be structured to be executable on the one or more computer nodes for loss-less migration of the one or more independent applications, and is operable to perform checkpointing of all state in a transport connection.

SYSTEMS AND METHODS FOR RATE MATCHING VIA A HETEROGENEOUS KERNEL WHEN USING GENERAL POLAR CODES

Systems and methods are disclosed for performing rate matching when using general polar codes. In one embodiment, a method of generating a codeword includes receiving bits at a polar encoder and encoding the bits using polar encoder kernels. The polar encoder kernels include a first kernel and a second kernel. The first kernel receives a set of input q-ary symbols and modifies the set of input q-ary symbols according to a first kernel generator matrix to produce a set of output q-ary symbols. The second kernel receives a set of input l-ary symbols, where l does not equal q, and modifies the set of input l-ary symbols according to a second kernel generator matrix to produce a set of output l-ary symbols. For example, the first kernel may be a binary kernel and the second kernel may be a Reed-Solomon (RS) based kernel.