Patent classifications
G06F11/2053
Front End Traffic Handling In Modular Switched Fabric Based Data Storage Systems
Systems, methods, apparatuses, and software for data storage systems are provided herein. In one example, a data storage system is provided that includes storage drives each comprising a PCIe interface, and configured to store data and retrieve the data stored on associated storage media responsive to data transactions received over a switched PCIe fabric. The data storage system includes processors configured to each manage only an associated subset of the storage drives over the switched PCIe fabric. A first processor is configured to identify first data packets received over a network interface associated with the first processor within a network buffer of the first processor as comprising a storage operation associated with at least one of the plurality of storage drives managed by a second processor, and responsively transfer the first data packets into a network buffer of the second processor.
Maintaining A Synchronous Replication Relationship Between Two Or More Storage Systems
Maintaining a synchronous replication relationship between two or more storage systems, including: receiving, by at least one of a plurality of storage systems across which a dataset will be synchronously replicated, timing information for at least one of the plurality of storage systems; and establishing, based on the timing information, a synchronous replication lease describing a period of time during which the synchronous replication relationship is valid, wherein a request to modify the dataset may only be acknowledged after a copy of the dataset has been modified on each of the storage systems.
Managing storage systems that are synchronously replicating a dataset
Managing storage systems that are synchronously replicating a dataset, including: detecting a change in membership to the set of storage systems synchronously replicating the dataset; and applying one or more membership protocols to determine a new set of storage systems to synchronously replicate the dataset, wherein the one or more membership protocols include a quorum protocol, an external management protocol, or a racing protocol, and wherein one or more I/O operations directed to the dataset are applied to a new set of storage systems.
Host-based error correction
Systems, apparatuses, and methods related to host-based error correction are described. Error correction operations can be performed on a host computing system as opposed to on a memory system. For instance, data containing erroneous bits can be transferred from a memory system to a host computing system and error correction operations can be performed using circuitry resident on the host computing system. In an example, a method can include receiving, by a host computing system, data that comprises a plurality of uncorrected bits from a memory system coupleable to the host computing system, determining an acceptable error range for the data based at least in part on an application associated with the data, and performing, using error correction logic resident on the host computing system, an initial error correction operation on the data based at least in part on the acceptable error range.
Targeted repair of hardware components in a computing device
A method for targeted repair of a hardware component in a computing device that is part of a cloud computing system includes monitoring a plurality of hardware components in the computing device. At some point, a defective sub-component within the hardware component of the computing device is identified. In addition to the defective sub-component, the hardware component also includes at least one sub-component that is functioning properly and a spare component that can be used in place of the defective sub-component. The method also includes initiating a targeted repair action while the computing device is connected to the cloud computing system. The targeted repair action prevents the defective sub-component from being used by the computing device without preventing sub-components that are functioning properly from being used by the computing device. The targeted repair action causes the spare component to be used in place of the defective sub-component.
SYSTEMS AND METHODS TO FLUSH DATA IN PERSISTENT MEMORY REGION TO NON-VOLATILE MEMORY USING AUXILIARY PROCESSOR
A computing system that enables data stored in a persistent memory region to be preserved when a processor fails can include volatile memory comprising the persistent memory region, non-volatile memory, and a system on a chip (SoC). The SoC can include a main processor that is communicatively coupled to both the volatile memory and the non-volatile memory. The SoC can also include an auxiliary processor that is communicatively coupled to both the volatile memory and the non-volatile memory. The SoC can also include instructions that are executable by the auxiliary processor to cause the data in the persistent memory region of the volatile memory to be transferred to the non-volatile memory in response to a failure of the main processor.
HOST-BASED ERROR CORRECTION
Systems, apparatuses, and methods related to host-based error correction are described. Error correction operations can be performed on a host computing system as opposed to on a memory system. For instance, data containing erroneous bits can be transferred from a memory system to a host computing system and error correction operations can be performed using circuitry resident on the host computing system. In an example, a method can include receiving, by a host computing system, data that comprises a plurality of uncorrected bits from a memory system coupleable to the host computing system, determining an acceptable error range for the data based at least in part on an application associated with the data, and performing, using error correction logic resident on the host computing system, an initial error correction operation on the data based at least in part on the acceptable error range.
Directing Operations To Synchronously Replicated Storage Systems
Managing connectivity to synchronously replicated storage systems, including: identifying a plurality of storage systems across which a dataset is synchronously replicated; identifying a host that can issue I/O operations directed to the dataset; identifying a plurality of data communications paths between the host and the plurality of storage systems across which a dataset is synchronously replicated; identifying, from amongst the plurality of data communications paths between the host and the plurality of storage systems across which a dataset is synchronously replicated, one or more optimal paths; and issuing, to the host, an identification of the one or more optimal paths.
COMPUTING DEVICE, REDUNDANT SYSTEM, PROGRAM, AND METHOD FOR CONSTRUCTING REDUNDANT CONFIGURATION
A computing device for configuring a redundant system includes: a detection unit configured to detect another computing device newly added to the redundant system during operation of the computing device; and a construction processing unit configured to execute construction processing for constructing a redundant configuration with the other computing device. The construction processing unit includes a first transmission unit for identifying construction start state information, and transmitting the construction start state information to the other computing device, and a second transmission unit for repeatedly performing identification of a change difference that arises in the state information as a result of computing processing of the computing device performed after identifying the construction start state information, from a time when the construction start state information was identified or a time when a previous timing was reached, and transmission of the change difference to the another computing device.
METHOD, ELECTRONIC DEVICE, AND PROGRAM PRODUCT FOR FAILURE HANDLING
Embodiments of the present disclosure relate to a method, an electronic device, and a computer program product for failure handling. This failure handling method includes determining a sector set failure type associated with at least one failed sector set of a disk; if the sector set failure type indicates that the number of failed sector sets in the at least one failed sector set is greater than a first threshold number, generating an instruction for replacing the disk; and otherwise performing at least one of the following: migrating data from a failed sector set in which the number of failed sectors is greater than a second threshold number to a spare sector set, and performing a failure recovery for a failed sector set in which the number of failed sectors is less than or equal to the second threshold number.