G06F11/16

Transparent north port recovery

Examples of techniques for transparent north port recovery of an error in an input/output device are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method may include: detecting, by a processing device, a command timeout; sending, by the processing device, an input/output (I/O) error signal to a host processing system connected to the hardware device via a north port of the hardware device; terminating, by the host processing system, a link between the north port of the hardware device and the host processing system; enabling, by the processing device, halt command forwarding on the hardware device; halting, by the processing device, commands upon detecting the halt command forwarding; and resetting, by the processing device, the link between the north port of the hardware device and the host processing system.

Management of test resources to perform testing of memory components under different temperature conditions

Filter information including a first temperature level and a second temperature level associated with a test process to be executed on one or more memory components is determined. Information associated with the test process is distributed to a first test component including a first set of memory components and a first temperature control component and a second test component including a second set of memory components and a second temperature control component. First feedback information associated with execution of the test process by the first test component at the first temperature level established by the first temperature control component is received. Second feedback information associated with execution of the test process by the second test component at the second temperature level established by the second temperature control component is received. Based on at least one of the first feedback information or the second feedback information, a failure of the test process executed using at least one of the first temperature level or the second temperature level is determined.

Efficient handling of RAID-F component repair failures

In one set of embodiments, a storage system can execute a repair process for a first component of a file or object stored on the storage system, where the repair process is initiated in response to the first component becoming inaccessible by the storage system, and where the file or object is split across a plurality of components including the first component. The executing can include, for each chunk in an address space of the first component starting from an initial chunk pointed to by a cursor: (1) determining whether the chunk is mapped to the first component, (2) if the chunk is mapped to the first component, copying data for the chunk from a mirror copy of the first component to a second component in the plurality of components, and (3) updating the cursor to point to a next chunk in the address space.

Electronic fault detection unit
09823983 · 2017-11-21 · ·

An electronic fault detection unit is provided that has a first register, a second register, a comparator circuit, and a timer circuit. The first and second register can be written from a first software portion, and a second software portion, respectively. The comparator circuit is arranged to detect that both the first and second register have been written, verify a relationship between first data written to the first register and second data written to the second register, and signal a fault upon said verification failing. The timer circuit is arranged to signal a fault if said verification of the comparator circuit does not occur within a time limit.

Storage of key-value entries in a distributed storage system
11256717 · 2022-02-22 · ·

A distributed storage system, such as a distributed storage system in a virtualized computing environment, stores data in storage nodes as immutable key-value entries. A coordinator storage node creates a key-value entry and attempts to store the key-value entry in the coordinator storage node and in neighbor storage nodes. If the storage of the key-value entry in the in the coordinator storage node and in the neighbor storage node is successful, the coordinator storage node pushes the key-value entry to other storage nodes in the distributed storage system for storage as replicas.

Methods and apparatus for providing hypervisor level data services for server virtualization
09785513 · 2017-10-10 · ·

A data center for data backup and replication, including a pool of multiple storage units for storing a journal of I/O write commands issued at respective times, wherein the journal spans a history window of a pre-specified time length, and a journal manager for dynamically allocating more storage units for storing the journal as the journal size increases, and for dynamically releasing storage units as the journal size decreases.

Remapping of memory in memory control architectures
09823984 · 2017-11-21 · ·

In accordance with embodiments disclosed herein, there is provided systems and methods for remapping of memory in memory control architectures. A processing device includes a processing core and a platform controller hub (PCH) coupled to the processing core. The PCH is to receive an indication of a failure associated with a first memory region of a plurality of memory regions residing in a memory. The PCH is also to interrupt an operating system to prompt for a reboot. Upon the reboot, the PCH is to remap a memory address range associated with the first memory region to a second memory region of the plurality of regions.

Intelligent access to a storage device

A method of failure detection in a storage system is performed by the storage system. The method includes detecting a failure in a nonvolatile random access memory device that is in or coupled to a storage device having storage memory. The storage system has multiple NVRAM devices and multiple storage devices that have storage memory. The method includes taking a portion or all of the NVRAM device offline. Taking a portion or all of the NVRAM device offline is responsive to detecting the failure. Taking a portion or all of the NVRAM device off-line is while keeping online the storage memory of the storage device, sufficient ones of the NVRAM devices, and sufficient ones of the storage devices to provide reliable access to data and metadata in the storage system.

Synchronizing replicated data in a storage network

method and apparatus for synchronizing replicated data in a storage network. In an embodiment, a method begins by a processing module of a computing device identifying a first storage set and a second storage set for replicated storage of a data object. The processing module initiates storage of the data object in both the first and second storage sets, and further maintains a synchronization status for the data object. The processing module determines, based at least in part on the synchronization status, to resynchronize the first storage set and the second storage set. In response to determining to resynchronize the first storage set and the second storage set, the processing module identifies a latest available revision of the data object, determines that the second storage set requires the latest available revision of the data object to maintain synchronization, and facilitates storage of the identified latest available revision of the data object in the second storage set.

Driver switch for device error recovery for assigned devices
09785519 · 2017-10-10 · ·

An error recovery system includes a memory, a processor in communication with the memory, a primary device, a backup device, a hypervisor executing on the processor, and a virtual machine. The virtual machine includes a guest operating system (OS) executing on the hypervisor, a pass-through device, and a guest driver. The hypervisor executes to detect an error associated with the primary device and to send a request to save a device state to the guest driver. The hypervisor also grants the guest OS access to the backup device. The guest driver receives the request from the hypervisor, and responsive to receiving the request, saves a state signature in the memory. The state signature includes a device signature and the device state of the primary device. Additionally, the guest driver determines a status of the device signature as one of matching and mismatching the backup device.