G06F11/1423

Managing datapath validation on per-transaction basis

A technique for managing a datapath of a data storage system includes receiving a request to access target data and creating a transaction that includes multiple datapath elements in a cache, where the datapath elements are used for accessing the target data. In response to detecting that one of the datapath elements is invalid, the technique further includes processing the transaction in a rescue mode. The rescue mode attempts to replace each invalid datapath element of the transaction with a valid version thereof obtained from elsewhere in the data storage system. The technique further includes committing the transaction as processed in the rescue mode.

Fault tolerant memory systems and components with interconnected and redundant data interfaces
11709736 · 2023-07-25 · ·

A memory system includes dynamic random-access memory (DRAM) components that include interconnected and redundant component data interfaces. The redundant interfaces facilitate memory interconnect topologies that accommodate considerably more DRAM components per memory channel than do traditional memory systems, and thus offer considerably more memory capacity per channel, without concomitant reductions in signaling speeds. The memory components can be configured to route data around defective data connections to maintain full capacity and continue to support memory transactions.

Integrated circuit and address mapping method for cache memory

An integrated circuit (IC) is provided. The IC includes a cache memory divided into a plurality of groups and an address decoder. The groups are assigned in rotation for a plurality of time periods. Each group is assigned in a corresponding single one of the time periods. The address decoder is configured to obtain a set address according to an access address and provide a physical address according to the set address. When the access address corresponds to a first group, the physical address is different from the set address. When the access address corresponds to the groups other than the first group, the physical address is the same as the set address. The sets of the first group that is assigned in a first time period are not overlapping with the sets of other first groups assigned in the time periods other than the first time period.

SYSTEMS AND METHODS FOR DETECTION OF DEGRADATION OF A VIRTUAL DESKTOP ENVIRONMENT

Described embodiments provide systems and methods for detection of the degradation of a virtual desktop environment. A computing device may receive data from a plurality of client devices. The computing device may identify a subset of client devices from the plurality of client devices with at least one characteristic in common based on the received data. The computing device may determine a ratio of the identified subset of client devices, the ratio being a comparison of client devices of the subset with a value above a first threshold to a total number of client devices of the subset, and the value being indicative of a characteristic of performance for that client device. The computing device may identify a cause of an anomaly in the performance of the application based on the ratio exceeding a second threshold.

Input-output path selection using switch topology information

Switch topology-aware path selection in an information processing system is provided. For example, an apparatus comprises a host device comprising a processor coupled to a memory. The host device is configured to communicate with a storage system over a network with a plurality of switches. The host device is further configured to obtain topology information associated with the plurality of switches in the network, and select a path from the host device to the storage system through one or more of the plurality of switches based at least in part on the obtained topology information.

SITE LOCALITY SUPPORT FOR FILE SERVICES IN A STRETCHED CLUSTER ENVIRONMENT
20230021195 · 2023-01-19 · ·

The location of resources for file services are located within the same site, thereby eliminating or reducing performance issues caused by cross-site accesses in a stretched cluster environment. A file server placement algorithm initially places file servers at a site based at least in part on host workload and affinity settings, and can perform failover to move the file servers to a different location (e.g., to a different host on the same site or to another site) in the event of a failure of the host where the file servers were initially placed. File servers may be co-located with clients at a location based on client latencies and site workload. Failover support is also provided in the event that the sites in the stretched cluster have different subnet addresses.

Automating the failover of a relational database in a cloud computing environment

Described herein is a method, system, and non-transitory computer readable medium for helping customers in accessing data through an application from a replica database, detecting whether the replica database, zone of availability of the replica database, or geographical region encompassing the zone of availability is experiencing an outage or other failure, and re-routing traffic to a backup replica database accordingly. To assess the status of the database, metrics are pushed in a secure manner from a private subnet to a public-facing monitoring agent, achieving a clear segregation of private subnet and public facing components. Further, circuit-breaker logic is included for preventing failure during updating DNS addresses during the re-routing process.

Cell-based backup for recovering from system failures in a multi-tenant computing system
11700556 · 2023-07-11 ·

A multi-tenant computing system provides services to a number of different tenant organizations. To address the problem of failure of portions of the system, the hardware infrastructure of the system is located at a number of different geographical locations. The various tenants are assigned to one of a set of “cells,” each cell corresponding to one of the geographical locations. Additionally, each cell has another one of the cells assigned to it as a backup cell, and the data of each cell is replicated within its assigned backup cell. At system run time, if a failure is detected within one of the cells, the network redirection is used within the multi-tenant system to reflect that the backup cell for the failing cell is now handling requests for the failing cell. Upon determination that the failing cell has been repaired and is now again correctly functioning, the network redirection is no longer employed, such that the (formerly) failing cell again handles its own requests.

Highspeed shared-memory optical network interfaces and topology

Examples herein include a computer system and methods. Some computer systems comprise two or more devices (each device comprises at least one processing circuit), where each computing device comprises or is communicatively coupled to one or more optical network interface controller (O-NIC) cards. Each O-NIC card comprises at least two bidirectional optical channels to transmit data and to receive additional data from each O-NIC card communicatively coupled to a device, over a channel. The system also includes one or more interfaces and a memory. Program instructions execute a method on one or more processors in communication with a memory, and the method includes modifying, during runtime of at least one application, a pairing over a given bidirectional optical channel of an interface of the interfaces to a given device.

System and method for improved fault tolerance in a network cloud environment

Described herein are systems and methods for fault tolerance in a network cloud environment. In accordance with various embodiments, the present disclosure provides an improved fault tolerance solution, and improvement in the fault tolerance of systems, by way of failure prediction, or prediction of when an underlying infrastructure will fail, and using the predictions to counteract the failure by spinning up or otherwise providing new component pieces to compensate for the failure.