G06F11/2035

AUTOMATED DISCOVERY OF DATABASES
20230161786 · 2023-05-25 ·

In some examples, a networked computing system comprises a backup node cluster of a backup service in communication with a host database node cluster of a host, a host database at least initially undiscovered by the backup node cluster, one or more processors coupled with memory storing instructions that, when executed, perform operations comprising at least installing a backup agent on at least one node of the host database node cluster, registering the host at the backup service, based on the host registration, triggering a host database discovery process to discover the undiscovered database automatically, the discovery process including a discovery call, in response to the discovery call, receiving metadata relating to the discovered database, and communicating with the discovered database.

Input/output apparatus and methods for monitoring and/or controlling dynamic environments

Apparatus and methods for flexible input/output signaling over a same signaling channel are described. A programmable interface circuit includes a signaling channel that can be adapted, prior to use or during operation, for transmission and/or reception of different types of analog and digital signals. The interface circuit can be used for communications between an isolating communication controller and components of a machine that use diverse signaling types.

INTRA-FOOTPRINT COMPUTING CLUSTER BRING-UP
20230153169 · 2023-05-18 · ·

Methods, systems and computer program products for intra-footprint computing cluster bring-up within a virtual private cloud. A network connection is established between an initiating module and a virtual private cloud (VPC). An initiating module allocates resources of the virtual private cloud including a plurality of nodes that correspond to members of a to-be-configured computing cluster. A cluster management module having coded therein an intended computing cluster configuration is configured into at least one of the plurality of nodes. The members of the to-be-configured computing cluster interoperate from within the VPC to accomplish a set of computing cluster bring-up operations that configure the plurality of members into the intended computing cluster configuration. Execution of bring-up instructions of the management module serve to allocate networking IP addresses of the virtual private cloud. The allocated networking IP addresses of the virtual private cloud are assigned to networking interfaces of the plurality of nodes.

Virtualized file server user views

In one embodiment, a system for managing a virtualization environment includes a plurality of host machines, wherein each of the host machines comprises a hypervisor and one or more user virtual machines (user VMs), and a virtual machine controller, one or more virtual disks comprising a plurality of storage devices, a virtualized file server (VFS) comprising a plurality of file server virtual machines (FSVMs), wherein each of the FSVMs is running on one of the host machines. The VFS may be configured to receive a request for storage system information from a user and generate and send a response to the request, wherein the response is customized according to configuration information of the VFS that is specific to the user. The storage system information requested may include a total size of storage available to the user, and the user may have an associated storage quota limit.

Managing failures in edge computing environments

A computer-implemented method, computer system and computer program product dynamically manage failure in an edge computing environment. According to the method, a request for executing a task may be sent to a first edge device according to a defined process, where the defined process is used to schedule tasks to be executed on edge devices. In response to the first edge device failing to execute the task, the defined process may be suspended. Then, a request for executing the task may be sent to a second edge device. A task result that is received first may be taken as the task result for the task, where the task result is from either the first edge device or the second edge device. And, continuing the rest of the defined process.

Resource manager for transaction processing systems

A resource manager (RM) instance is associated with each transaction processing system (TPS) member, of a TPS group. Each RM instance monitors performance of the associated TPS member. If a TPS member becomes unavailable for any reason (a failing TPS), the associated RM instance broadcasts status of the failing TPS to RMs associated “surviving” members of the group. RM instances associated with surviving members initiate a series of actions that reduce the resources used by the surviving TPS members. Consequently, the surviving TPS members are better able to process the additional workload imposed on them due to the unavailability of the failing TPS. Once the failing TPS is brought back online and made available again (or a replacement TPS is brought online), RM instances associated with the surviving members perform actions to undo the resource usage reduction tasks, and the TPS group returns to a nominal configuration.

Hardware-Assisted Memory Disaggregation with Recovery from Network Failures Using Non-Volatile Memory
20230205649 · 2023-06-29 ·

Techniques for implementing hardware-assisted memory disaggregation with recovery from network failures/problems are provided. In one set of embodiments, a hardware controller of a computer system can maintain a copy of a “remote memory” of the computer system (i.e., a section of the physical memory address space of the computer system that maps to a portion of the physical system memory of a remote computer system) in a local backup memory. The backup memory may be implemented using a non-volatile memory that is slower, but also less expensive, than conventional dynamic random-access memory (DRAM). Then, if the hardware controller is unable to retrieve data in the remote memory from the remote computer system within a specified time window due to, e.g., a network failure or other problem, the hardware controller can retrieve the data from the backup memory, thereby avoiding a hardware error condition (and potential application/system crash).

METERING FRAMEWORK FOR IMPROVING RESOURCE UTILIZATION FOR A DISASTER RECOVERY ENVIRONMENT
20230205653 · 2023-06-29 ·

A framework is described that improves resource utilization during operations executing within workflows of the distributed data processing system (e.g., having a plurality of interconnected nodes) in a disaster recovery (DR) environment configured to support synchronous and asynchronous (i.e., heterogeneous) DR workflows (e.g., generating snapshots and replicating data) that include synchronous replication, asynchronous replication, nearsync (i.e., short duration snapshots of metadata) replication and migration of data objects associated with the workflows for failover (e.g., replication and/or migration) to a secondary site in the event of failure of the primary site. The framework meters (regulates) execution of the operations directed to the workloads so as to efficiently use the resources in a manner that allows timely progress (completion) of certain (e.g., high-frequency) operations and reduction in blocking (stalling) of other (e.g., low-frequency) operations by avoiding unnecessary resource hoarding/consumption and contention. Notably, the framework also provides metering and tuning of properties during execution of the workflows and maintains their state to provide for recovery.

VIRTUAL MACHINE RECOVERY IN SHARED MEMORY ARCHITECTURE

Examples provide for virtual machine recovery using pooled memory. A shared partition is created on pooled memory accessible by a plurality of virtual machine hosts. A set of memory pages for virtual machines running on the hosts is moved to the shared partition. A master agent polls memory page tables associated with the plurality of hosts for write access. If the master agent obtains write access to a memory page table of a given host, the given host that previously held the write access is identified as a failed host or an isolated host. The virtual machines of the given host enabled to resume from pooled memory are respawned on a new host while maintaining memory state of the virtual machines using data within the pooled memory, including the virtual machine memory pages, memory page table, host profile data, and/or host-to-VM table data.

FAULT TOLERANCE USING SHARED MEMORY ARCHITECTURE

Examples provide a fault tolerant virtual machine (VM) using pooled memory. When fault tolerance is enabled for a VM, a primary VM is created on a first host in a server cluster. A secondary VM is created on a second host in the server cluster. Memory for the VMs is maintained on a shared partition in pooled memory. The pooled memory is accessible to all hosts in the cluster. The primary VM has read and write access to the VM memory in the pooled memory. The secondary VM has read-only access to the VM memory. If the second host fails, a new secondary VM is created on another host in the cluster. If the first host fails, the secondary VM becomes the new primary VM and a new secondary VM is created on another host in the cluster.