Patent classifications
G06F11/1428
SAFE-STATING A SYSTEM INTERCONNECT WITHIN A DATA PROCESSING SYSTEM
A data processing system includes a system interconnect, a first master, and a bridge circuit. The bridge circuit is coupled between the first master and the system interconnect. The bridge circuit is configured to, in response to occurrence of an error in the first master, isolate the first master from the system interconnect, wherein the isolating by the bridge circuit is performed while the first master has one or more outstanding issued write commands to the system interconnect which have not been completed. The bridge circuit is further configured to, after isolating the first master from the system interconnect, complete the one or more outstanding issued write commands while the first master remains isolated from the system interconnect.
Configuring acceleration components over a network
Aspects extend to methods, systems, and computer program products for (re)configuring acceleration components over a network. (Re)configuration can be implemented for any of a variety of reasons, including to address an error in functionality at the acceleration component or to update functionality at the acceleration component. During (re)configuration, connectivity can be maintained for any other functionality at the acceleration component untouched by the (re)configuration. Network (re)configuration of acceleration components facilitates management of acceleration components and accelerated services from a centralized service. Network (re)configuration of acceleration components also relieves host components from having to store (potentially diverse and numerous) image files.
Method and apparatus for predictive failure handling of interleaved dual in-line memory modules
An information handling system includes interleaved dual in-line memory modules (DIMMs) that are partitioned into logical partitions, wherein each logical partition is associated with a namespace. A DIMM controller sets a custom DIMM-level namespace-based threshold to detect a DIMM error and to identify one of the logical partitions of the DIMM error using the namespace associated with the logical partition. The detected DIMM error is repaired if it exceeds an error correcting code (ECC) threshold.
TRANSMISSION LINK TESTING
A computing system can comprise a processing resource and a memory device coupled together via a first transmission link. The processing resource can be configured to test the first transmission link in response to the memory device failing to execute a command by sending the command to the memory device again for retry and monitoring the first transmission link for signals that indicate whether the command was executed by the memory device.
METHOD FOR A RELIABILITY, AVAILABILITY, AND SERVICEABILITY-CONSCIOUS HUGE PAGE SUPPORT
A method includes, in response to a memory error indication indicating an uncorrectable error in a faulted segment, associating in a remapping table the faulted segment with a patch segment in a patch memory region, and in response to receiving from a processor a memory access request directed to the faulted segment, servicing the memory access request from the patch segment by performing the requested memory access at the patch segment based on a patch segment address identifying the location of the patch segment. The patch segment address is determined from the remapping table and corresponds to a requested memory address specified by the memory access request.
SEMICONDUCTOR DIE FAILURE RECOVERY IN A DATA STORAGE DEVICE
Apparatus and method for a die kill and recovery sequence for a non-volatile memory (NVM). Data are stored in the NVM as data sets in garbage collection units (GCUs) that span multiple semiconductor dies. A die failure management circuit is configured to detect a die failure event associated with a selected die, and to generate a recovery strategy to accommodate the detected die failure event by selecting recovery actions to be taken in a selected sequence to maintain a current level of data transfer performance with a client device. The selected recovery actions are carried out in the selected sequence to transfer at least a portion of the user data stored in the selected die to a new replacement die, after which the selected die is decommissioned from further use. The NVM may be a flash memory of a solid-state drive (SSD).
FAULT RESILIENT STORAGE DEVICE
A storage device, and a method for operating a storage device. In some embodiments, the storage device includes storage media, and the method includes: determining, by the storage device, that the storage device is in a first fault state from which recovery is possible by power cycling the storage device or by formatting the storage media; determining, by the storage device, that the storage device is in a second fault state from which partial recovery is possible by operating the storage device with reduced performance, with reduced capacity, or in a read-only mode; and operating the storage device with reduced performance, with reduced capacity, or in the read-only mode.
Method and apparatus for detecting and remedying single event effects
A method is provided. The method of resetting a system, comprising: receiving data from the electronic sub-system; determine if a non-hardware fault is detected; if a non-hardware fault is detected, then performing a software reset of the electronics sub-system; if no non-hardware fault has been detected, then determining if a hardware fault is detected; and if a hardware fault is detected, then performing a hardware reset of the system.
SYSTEM AND DEVICE FOR DATA RECOVERY FOR EPHEMERAL STORAGE
In various embodiments, a method for page cache management is described. The method can include: identifying a storage device fault associated with a fault-resilient storage device; determining that a first region associated with the fault-resilient storage device comprises an inaccessible space and that a second region associated with the fault-resilient storage device comprises an accessible space; identifying a read command at the second storage device for the data and determine, based on the read command, first data requested by a read operation from a local memory of the second storage device; determining, based on the read command, second data requested by the read operation from the second region; retrieving the second data from the second region; and scheduling a transmission of the second data from the fault-resilient storage device to the second storage device.
SYSTEMS, METHODS, AND DEVICES FOR FAULT RESILIENT STORAGE
A method of operating a storage device may include determining a fault condition of the storage device, selecting a fault resilient mode based on the fault condition of the storage device, and operating the storage device in the selected fault resilient mode. The selected fault resilient mode may include one of a power cycle mode, a reformat mode, a reduced capacity read-only mode, a reduced capacity mode, a reduced performance mode, a read-only mode, a partial read-only mode, a temporary read-only mode, a temporary partial read-only mode, or a vulnerable mode. The storage device may be configured to perform a namespace capacity management command received from the host. The namespace capacity management command may include a resize subcommand and/or a zero-size namespace subcommand. The storage device may report the selected fault resilient mode to a host.