G06F11/2007

Tiering Valid Data after a Disaster Recovery Operation

Staging data on a storage element integrating fast durable storage and bulk durable storage, including: receiving, at a storage element integrating fast durable storage and bulk durable storage, a data storage operation from a host computer; storing data corresponding to the data storage operation within fast durable storage in accordance with a first data resiliency technique; and responsive to detecting a condition for transferring data between fast durable storage and bulk durable storage, transferring the data from fast durable storage to bulk durable storage in accordance with a second data resiliency technique.

Communications Method and Related Apparatus
20230016684 · 2023-01-19 ·

A PCIe-based communications system includes a first processor and a plurality of switches, the plurality of switches include a first switch and a second switch, a first link exists between the first processor and the first switch, a second link exists between the first switch and the second switch, and a first standby link is configured between the first processor and the second switch. If the first link and the second link are not faulty, communicating, by the first processor, with the second switch through the first link and the second link; or if the first link or the second link is faulty, activating the first standby link, and communicating, by the first processor, with the second switch through the activated first standby link. Thereby stability of the communications system can be improved.

SITE LOCALITY SUPPORT FOR FILE SERVICES IN A STRETCHED CLUSTER ENVIRONMENT
20230021195 · 2023-01-19 · ·

The location of resources for file services are located within the same site, thereby eliminating or reducing performance issues caused by cross-site accesses in a stretched cluster environment. A file server placement algorithm initially places file servers at a site based at least in part on host workload and affinity settings, and can perform failover to move the file servers to a different location (e.g., to a different host on the same site or to another site) in the event of a failure of the host where the file servers were initially placed. File servers may be co-located with clients at a location based on client latencies and site workload. Failover support is also provided in the event that the sites in the stretched cluster have different subnet addresses.

CLUSTER WIDE REBUILD REDUCTION AGAINST STORAGE NODE FAILURES
20230013798 · 2023-01-19 ·

Systems, apparatuses and methods may provide for technology that detects a first failure in a first storage server, wherein the first storage server is connected to a first non-volatile memory (NVM) via a switch, selects a second storage server that is connected to the first NVM via the switch, wherein the first storage server and the second storage server are in a storage cluster, and configures the second storage server to host first data resident on the first NVM, wherein configuring the second storage server to host the first data bypasses a cluster-wide rebalance of the storage cluster.

SELECTING INTERFACES FOR DEVICE-GROUP IDENTIFIERS

In one embodiment, a computer networking device calculates a first hash value for an identifier of a group of computing devices, as well as a second hash value for the identifier of the group of computing devices, with each hash value being at least in part on the identifier of the group of computing devices and an identifier of the respective interface. The computer networking device may also analyze the first hash value with respect to the second hash value and select the first interface for association with the identifier of the group of computing devices based at in part on the analyzing. The computer networking device may further store an indication that the identifier of the group of computing devices is associated with the first interface.

PERIPHERAL COMPONENT INTERCONNECT EXPRESS DEVICE AND OPERATING METHOD THEREOF
20220374319 · 2022-11-24 ·

A Peripheral Component Interconnect Express (PCIe) device includes a plurality of lanes comprising a plurality of ports, a link controller setting a link including the plurality of lanes, wherein the link is set to have a link width that includes the remaining of lanes, except for a fail lane from among the plurality of lanes, and an EQ controller performing an equalization operation for determining a transmitter or receiver setting of each of the remaining lanes, wherein the EQ controller determining a final EQ coefficient using a log information and an error information.

PERIPHERAL COMPONENT INTERCONNECT EXPRESS DEVICE AND COMPUTING SYSTEM INCLUDING THE SAME
20220374384 · 2022-11-24 · ·

A PCIe device setting, when a fail lane is detected during a link setting operation, a link by using remaining lanes according to the present disclosure includes a plurality of lanes comprising a plurality of ports, and a link controller setting a link including the plurality of lanes, wherein the link is set to have a link width that includes the plurality of lanes, except for a fail lane from among the plurality of lanes, wherein the fail lane from among the plurality of lanes has a state in which the fail lane is unable to form a link with remaining lanes that have not failed from among the plurality of lanes.

Staging data within a unified storage element

Staging data on a storage element integrating fast durable storage and bulk durable storage, including: receiving, at a storage element integrating fast durable storage and bulk durable storage, a data storage operation from a host computer; storing data corresponding to the data storage operation within fast durable storage in accordance with a first data resiliency technique; and responsive to detecting a condition for transferring data between fast durable storage and bulk durable storage, transferring the data from fast durable storage to bulk durable storage in accordance with a second data resiliency technique.

Managing replication of computing nodes for provided computer networks

Techniques are described for providing managed computer networks, such as for managed virtual computer networks overlaid on one or more other underlying computer networks. In some situations, the techniques include facilitating replication of a primary computing node that is actively participating in a managed computer network, such as by maintaining one or more other computing nodes in the managed computer network as replicas, and using such replica computing nodes in various manners. For example, a particular managed virtual computer network may span multiple broadcast domains of an underlying computer network, and a particular primary computing node and a corresponding remote replica computing node of the managed virtual computer network may be implemented in distinct broadcast domains of the underlying computer network, with the replica computing node being used to transparently replace the primary computing node in the virtual computer network if the primary computing node becomes unavailable.

High reliability fault tolerant computer architecture

A fault tolerant computer system and method are disclosed. The system may include a plurality of CPU nodes, each including: a processor and a memory; at least two IO domains, wherein at least one of the IO domains is designated an active IO domain performing communication functions for the active CPU nodes; and a switching fabric connecting each CPU node to each IO domain. One CPU node is designated a standby CPU node and the remainder are designated as active CPU nodes. If a failure, a beginning of a failure, or a predicted failure occurs in an active node, the state and memory of the active CPU node are transferred to the standby CPU node which becomes the new active CPU node. If a failure occurs in an active IO domain, the communication functions performed by the failing active IO domain are transferred to the other IO domain.