G06F11/2043

CLUSTER WIDE REBUILD REDUCTION AGAINST STORAGE NODE FAILURES
20230013798 · 2023-01-19 ·

Systems, apparatuses and methods may provide for technology that detects a first failure in a first storage server, wherein the first storage server is connected to a first non-volatile memory (NVM) via a switch, selects a second storage server that is connected to the first NVM via the switch, wherein the first storage server and the second storage server are in a storage cluster, and configures the second storage server to host first data resident on the first NVM, wherein configuring the second storage server to host the first data bypasses a cluster-wide rebalance of the storage cluster.

MEDIATOR ASSISTED SWITCHOVER BETWEEN CLUSTERS

Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.

Memory scanning operation in response to common mode fault signal

An apparatus comprises a plurality of redundant processing units to perform data processing redundantly in lockstep; common mode fault detection circuitry to detect an event indicative of a potential common mode fault affecting each of the plurality of redundant processing units; a memory shared between the plurality of redundant processing units; and memory checking circuitry to perform a memory scanning operation to scan at least part of the memory for errors; in which the memory checking circuitry performs the memory scanning operation in response to a common mode fault signal generated by the common mode fault detection circuitry indicating that the event indicative of a potential common mode fault has been detected.

Method and apparatus for performing node information exchange management of all flash array server
11636012 · 2023-04-25 · ·

A method and apparatus for performing node information exchange management of an all flash array (AFA) server are provided. The method may include: utilizing a hardware manager module among multiple program modules running on any node of multiple nodes of the AFA server to control multiple hardware components in a hardware layer of the any node, for establishing a Board Management Controller (BMC) path between the any node and a remote node among the multiple nodes; utilizing at least two communications paths to exchange respective node information of the any node and the remote node, to control a high availability (HA) architecture of the AFA server according to the respective node information of the any node and the remote node, for continuously providing a service to a user of the AFA server; and in response to malfunction of any communications path, utilizing remaining communications path(s) to exchange the node information.

DATA TRANSMISSION METHOD AND ELECTRONIC CHIP OF THE MANYCORE TYPE

A method for transmitting data between functions implemented on a first electronic chip of the manycore type. The first electronic chip includes a plurality of execution cores, the execution cores being grouped in clusters, the clusters being interconnected by at least two communication systems. The data transmission method includes the steps of: implementing a first function on a first cluster; implementing a second function on a second cluster, characterised in that the second function is also implemented on a third cluster distinct from the first and second clusters; and transmitting at least one data item between the first function and the second function.

HIGH-AVAILABILITY CLOUD-BASED AUTOMATION SOLUTION WITH OPTIMIZED TRANSMISSION TIMES

The real time capability is to be improved in a Cloud-based control system for an automation plant. To this end, a redundantly embodied, Cloud-based control system with a plurality of computing resources distributed over a network with control applications running thereon is proposed, which, embodied as a primary and backups, execute a control program almost simultaneously and send corresponding program instructions to the automation plant. Long transmission times of individual computing resources therefore do not have a negative effect on the control of the automation plant.

VIRTUAL MACHINE RECOVERY IN SHARED MEMORY ARCHITECTURE

Examples provide for virtual machine recovery using pooled memory. A shared partition is created on pooled memory accessible by a plurality of virtual machine hosts. A set of memory pages for virtual machines running on the hosts is moved to the shared partition. A master agent polls memory page tables associated with the plurality of hosts for write access. If the master agent obtains write access to a memory page table of a given host, the given host that previously held the write access is identified as a failed host or an isolated host. The virtual machines of the given host enabled to resume from pooled memory are respawned on a new host while maintaining memory state of the virtual machines using data within the pooled memory, including the virtual machine memory pages, memory page table, host profile data, and/or host-to-VM table data.

FAULT TOLERANCE USING SHARED MEMORY ARCHITECTURE

Examples provide a fault tolerant virtual machine (VM) using pooled memory. When fault tolerance is enabled for a VM, a primary VM is created on a first host in a server cluster. A secondary VM is created on a second host in the server cluster. Memory for the VMs is maintained on a shared partition in pooled memory. The pooled memory is accessible to all hosts in the cluster. The primary VM has read and write access to the VM memory in the pooled memory. The secondary VM has read-only access to the VM memory. If the second host fails, a new secondary VM is created on another host in the cluster. If the first host fails, the secondary VM becomes the new primary VM and a new secondary VM is created on another host in the cluster.

METHOD AND APPARATUS FOR FAILOVER PROCESSING
20170364423 · 2017-12-21 ·

Embodiments of the present disclosure provide a method and apparatus for failover. In an embodiment is provided a method implemented at a first node in a cluster comprising a plurality of heterogeneous nodes. The method comprises: determining whether an application at a second node in the cluster is failed; and in response to determining that the application is failed, causing migration of data and services associated with the application from the second node to a third node in the cluster, the migration involving at least one node heterogeneous to the second node in the cluster. The present disclosure further provides a method implemented at the third node in the cluster and corresponding devices and computer program products.

CORE PAIRING IN MULTICORE SYSTEMS
20170364421 · 2017-12-21 ·

A method, executed by a computer, includes pairing a first core with a second core to form a first core group, wherein each core of the group has a plurality of functional units, transferring instructions received by the first core to the second core for execution via a first inter-core communication bus, and executing the instructions on the second core. A computer system and computer program product corresponding to the above method are also disclosed herein.