Patent classifications
G06F11/0727
Fast multipath failover
A host device is configured to obtain a default timeout value of the host device for the submission of an input-output (IO) operation to a storage system and to determine a first timeout value that is less than the default timeout value. The host device is further configured to submit the IO operation to the storage system along a first path using the first timeout value and to determine that the submission of the IO operation along the first path has timed out. The host device is further configured to determine a second timeout value that is greater than the first timeout value and to submit the IO operation to the storage system along a second path using the second timeout value.
Memory device and operating method of the same
A memory device includes a memory cell array including memory cells connected to word lines and bit lines. Each of the memory cells includes a switch element and a memory element, and has a first state or a second state in which a threshold voltage is within a first voltage range or a second voltage range, lower than the first voltage range. A memory controller is configured to execute a first read operation for the memory cells using a first read voltage, higher than a median value of the first voltage range, program first defect memory cells turned off during the first read operation to the first state, execute a second read operation for the memory cells using a second read voltage, lower than a median value of the second voltage range, and execute a repair operation for second defect memory cells turned on during the second read operation.
Optimized relocation of data based on data characteristics
A command is transmitted to a storage device to relocate first data that partially fills a first erase block of the storage device and second data that partially fills a second erase block of the storage device to a third erase block of the storage device, wherein the command causes the relocation of the first data and the second data while bypassing sending the data to the storage controller. An acknowledgement that the first data and the second data have been stored at the third erase block is received from the storage device.
FLEET HEALTH MANAGEMENT DEVICE CLASSIFICATION FRAMEWORK
An approach to identifying a corrective action for a data storage device (DSD), such as one implemented in a fleet of DSDs in a data center, involves receiving error data about excursions from normal operational behavior of the DSD, inputting data representing a particular excursion into a probabilistic decision network which characterizes a set of DSD operational metrics and certain DSD controller rules that represent internal controls of the DSD and corresponding conditional relationships among the operational metrics, determining from the decision network the likelihood that one or more possible causes was a contributing factor to the particular excursion, and determining a corrective action for the particular excursion based on the determined likelihood of a particular cause of the one or more possible causes. The corrective action may then be shared with the DSD for in-situ execution of corresponding self-repair operations.
Determining Remaining Hardware Life In A Storage Device
Determining remaining hardware life in a storage system, including: receiving data about a plurality of hardware components including data describing the usage of each hardware component and the state of each hardware component; analyzing the data to determine a remaining hardware life for each hardware component in a group of components; and distributing workloads in order to balance wear amongst the hardware components in the group.
Detection and mitigation for solid-state storage device read failures due to weak erase
Weak erase detection and mitigation techniques are provided that detect permanent failures in solid-state storage devices. One exemplary method comprises obtaining an erase fail bits metric for a solid-state storage device; and detecting a permanent failure in at least a portion of the solid-state storage device causing weak erase failure mode by comparing the erase fail bit metric to a predefined fail bits threshold. In at least one embodiment, the method also comprises mitigating for the permanent failure causing the weak erase failure mode for one or more cells of the solid-state storage device. The mitigating for the permanent failure comprises, for example, changing a status of the one or more cells to a defective state and/or a retired state. The detection of the permanent failure causing the weak erase failure mode comprises, for example, detecting the weak erase failure mode without an erase failure.
Exact repair regenerating codes for distributed storage systems
A distributed storage system includes a plurality of nodes comprising a first node, wherein a total number of nodes in the distributed storage system is represented by n, wherein a file stored in the distributed storage system is recovered from a subset of a number of nodes represented by k upon a file failure on a node in the distributed storage system, and wherein a failed node in the plurality of nodes is recovered from a number of helper nodes of the plurality of nodes represented by d. Upon detecting a failure in the first node, each helper node of the number of helper nodes is configured to determine a repair-encoder matrix, multiply a content matrix by the repair-encoder matrix to obtain a repair matrix, extract each linearly independent column of the repair matrix, and send the linearly independent columns of the repair matrix to the first node.
Notifying memory system of host events via modulated reset signals
An example memory sub-system includes a memory device and a processing device, operatively coupled to the memory device. The processing device is configured to receive a reset signal from a host computer system in communication with the memory system; identify, by decoding the reset signal, a host event specified by the reset signal; and process the identified host event.
Determining capacity in storage systems using machine learning techniques
Methods, apparatus, and processor-readable storage media for determining capacity in storage systems using machine learning techniques are provided herein. An example computer-implemented method includes obtaining capacity-related data from a storage system; forecasting, for a given temporal period, capacity of one or more storage objects of the storage system by applying machine learning techniques to at least a portion of the capacity-related data; aggregating the forecasted capacity for at least portions of the one or more storage objects; determining, based on the aggregated forecasted capacity of the storage objects, whether at least a portion of the storage system will run out of capacity in connection with the given temporal period; and performing one or more automated actions based at least in part on the determination as to whether the at least a portion of the at least one storage system will run out of capacity.
Dynamic modification of IO shaping mechanisms of multiple storage nodes in a distributed storage system
At least one processing device is configured to detect a failure event impacting at least a first storage node of a plurality of storage nodes of a distributed storage system, and responsive to the detected failure event, to modify an input-output (IO) shaping mechanism in each of the storage nodes in order to at least temporarily reduce a total number of IO operations that are concurrently processed in the distributed storage system. For example, modifying an IO shaping mechanism in each of the storage nodes illustratively comprises transitioning the IO shaping mechanism in each of the storage nodes from a first operating mode to a second operating mode that is different than the first operating mode. The second operating mode of the IO shaping mechanism illustratively has a relatively faster responsiveness to changes in IO operation latency as compared to the first operating mode of the IO shaping mechanism.