G06F11/1428

RAID system with fault resilient storage devices

A storage system, and a method for operating a storage system. In some embodiments, the system includes a first storage device and a second storage device, and the method includes: determining that the first storage device is in a read-only state and that the second storage device is in a read-write state; performing a write operation, of a first stripe, to the storage system; performing a first read operation, of a second stripe, from the storage system; and performing a second read operation, of the first stripe, from the storage system, wherein: the performing of the write operation includes: writing a portion of the first stripe to the second storage device, and making an entry in a mapping table for the first stripe.

Transmission link testing
11748220 · 2023-09-05 · ·

A computing system can comprise a processing resource and a memory device coupled together via a first transmission link. The processing resource can be configured to test the first transmission link in response to the memory device failing to execute a command by sending the command to the memory device again for retry and monitoring the first transmission link for signals that indicate whether the command was executed by the memory device.

Flexible bus management

Methods, systems, and devices for flexible bus management are described. A memory device may transfer data between the memory device and another device (e.g., host device) using a bus including a plurality of data pins. The memory device may transfer data according to a first bus configuration (e.g., according to a first width corresponding to using all of the data pins). After receiving an indication to adjust the configuration, the memory device may adjust the first bus configuration to a second bus configuration where the bus operates according to a second width (e.g., using a subset of the data pins). The memory device may adjust the bus width between the other device and the memory device without adjusting an internal bus width of the memory device (e.g., internal busses that transfer data from the data pins to various components within the memory device).

SYSTEM AND DEVICE FOR DATA RECOVERY FOR EPHEMERAL STORAGE
20230251931 · 2023-08-10 ·

In various embodiments, a method for page cache management is described. The method can include: identifying a storage device fault associated with a fault-resilient storage device; determining that a first region associated with the fault-resilient storage device comprises an inaccessible space and that a second region associated with the fault-resilient storage device comprises an accessible space; identifying a read command at the second storage device for the data and determine, based on the read command, first data requested by a read operation from a local memory of the second storage device; determining, based on the read command, second data requested by the read operation from the second region; retrieving the second data from the second region; and scheduling a transmission of the second data from the fault-resilient storage device to the second storage device.

Managing storage reduction and reuse in the presence of storage device failures
11650881 · 2023-05-16 · ·

A system and method for managing a reduction in capacity of a memory sub-system. An example method involving a host system: determining, by a host system, that a failure affects a storage capacity of a memory sub-system, wherein the memory sub-system comprises stored data of a storage structure; instructing, by the host system, the memory sub-system to operate at a reduced capacity and to retain the stored data of the storage structure; receiving, by the host system, a set of storage units of the memory sub-system that are affected by the failure; and recovering, by the host system, data that was in the set of storage units affected by the failure.

Fault resilient storage device

A storage device, and a method for operating a storage device. In some embodiments, the storage device includes storage media, and the method includes: determining, by the storage device, that the storage device is in a first fault state from which recovery is possible by power cycling the storage device or by formatting the storage media; determining, by the storage device, that the storage device is in a second fault state from which partial recovery is possible by operating the storage device with reduced performance, with reduced capacity, or in a read-only mode; and operating the storage device with reduced performance, with reduced capacity, or in the read-only mode.

FAULT ISOLATION SYSTEM, METHOD AND PROGRAM
20220261319 · 2022-08-18 · ·

The configuration information generation unit 101, when given a set of constituent requirements, generates configuration information representing the system, by repeating operation of replacing the constituent requirement with a set of more concrete constituent requirements according to a replacement rule. The verification program generation unit 106 generates, for each constituent requirement, a verification program for verifying whether parts in the system corresponding to the constituent requirement in the configuration information are normal or not. The verification program execution unit 107 causes the system to execute the verification program. The fault isolation unit 109 separates the part in the system where a fault may have occurred from the part where the fault has not occurred, according to whether execution result of the verification program is success or not.

METHODS AND SYSTEMS FOR PREVENTING HANGUP IN A POST ROUTINE FROM FAULTY BIOS SETTINGS
20220269565 · 2022-08-25 ·

A system and method for preventing a hang up after initiation of a watch dog time out in a computer system. A start-up routine is run via a basic input output system (BIOS). The routine applies settings for hardware components. It is determined if a watch dog timer triggered a restart from timing out when the start-up routine ran previously. The system checks a database storing settings for each of the plurality of hardware components for a proper setting for the hardware components if the watch dog timer triggered the restart. The system applies the settings from the database for the hardware components to avoid another hang up.

MANAGING CAPACITY REDUCTION DUE TO STORAGE DEVICE FAILURE
20220300376 · 2022-09-22 ·

A system and method for managing a reduction in capacity of a memory sub-system. An example method involving a memory sub-system: detecting a failure of at least one memory device of the set, wherein the failure affects stored data; notifying a host system of a change in a capacity of the set of memory devices; receiving from the host system an indication to continue at a reduced capacity; and updating the set of memory devices to change the capacity to the reduced capacity.

MANAGING STORAGE REDUCTION AND REUSE IN THE PRESENCE OF STORAGE DEVICE FAILURES
20220300377 · 2022-09-22 ·

A system and method for managing a reduction in capacity of a memory sub-system. An example method involving a host system: determining, by a host system, that a failure affects a storage capacity of a memory sub-system, wherein the memory sub-system comprises stored data of a storage structure; instructing, by the host system, the memory sub-system to operate at a reduced capacity and to retain the stored data of the storage structure; receiving, by the host system, a set of storage units of the memory sub-system that are affected by the failure; and recovering, by the host system, data that was in the set of storage units affected by the failure.