G06F11/16

Fault Tolerant Communication System
20170371754 · 2017-12-28 ·

Described is a differential data bus system which maintains error free communication despite faults in one of the data bus lines.

MEMORY APPARATUS FOR APPLYING FAULT REPAIR BASED ON PHYSICAL REGION AND VIRTUAL REGION AND CONTROL METHOD THEREOF

Provided are a memory apparatus for applying fault repair based on a physical region and a virtual region and a control method thereof. That is, the fault repair is applied based on the physical region and the virtual region which use an information storage table of a virtual basic region using a hash function, thereby improving efficiency of the fault repair.

METHOD AND SYSTEM FOR MANAGING STORAGE SYSTEM

Embodiments of the present invention provide a method and a system for managing a storage system. In one embodiment of the present invention, there is provided a method for managing a storage system, where the storage system comprises a first controller, a second controller, a first communication area as well as a second communication area. The method comprising: with respect to a storage device in the storage system, in response to the first controller successfully accessing the storage device, writing to the first communication area a first state that indicates a state relationship between the first controller and the storage device, where the first communication area is readable and writable to the first controller and readable to the second controller; reading from the second communication area a second state that indicates a state relationship between the second controller and the storage device, where the second communication area is readable to the first controller and readable and writable to the second controller; and in response to the second state indicating that the second controller successfully accesses the storage device, initializing the storage system. In one embodiment of the present invention, there is further provided a corresponding system and apparatus.

Method of recovering application data from a memory of a failed node

A method of recovering application data from the memory of a failed node in a computer system comprising a plurality of nodes connected by an interconnect and of writing the application data to a replacement node; wherein a node of the computer system executes an application which creates application data storing the most recent state of the application in a node memory; the node fails; the node memory of the failed node is then controlled using a failover memory controller; and the failover memory controller copies the application data from the node memory of the failed node to a node memory of the replacement node over the interconnect.

STORAGE SYSTEM AND STORAGE SYSTEM CONTROL METHOD
20230205650 · 2023-06-29 ·

A storage system includes one or more storage units, and a calculator. Each of the one or more storage units has plural physical storage devices, and a processor. The calculator has a controller that processes data inputted to and outputted from the storage unit by the processor. The storage system subjects the data to redundancy and stores the redundant data, and when a failure in which the data related to a read request cannot be read from one of the physical storage devices has occurred, reads the data from the readable physical storage device, recovers the data related to the read request, and transmits the recovered data to a request source of the read request. A process for recovering the data related to the read request from the read data can be selectively executed by the controller of the calculator and the processor of the storage unit.

IDENTIFICATION OF OPTIMAL BIT APPORTIONMENTS FOR DIGITAL FUNCTIONS SUBJECT TO SOFT ERRORS
20230205651 · 2023-06-29 ·

A method includes identifying multiple apportionments, where each apportionment identifies numbers of bit copies to be stored in at least one memory for at least some bits of a data value. The method also includes, for each apportionment, estimating a numerical error associated with use of the apportionment with a specified function, where the numerical error is estimated by creating errors in bit copies of multiple data values processed using the specified function. The method further includes combining portions of different ones of the apportionments having lower estimated numerical errors to create multiple derived apportionments. The method also includes, for each derived apportionment, estimating a numerical error associated with use of the derived apportionment with the specified function. In addition, the method includes selecting a final apportionment for use with the specified function, where the final apportionment includes or is based on at least one of the derived apportionments.

Virtual machine data placement in a virtualized computing environment

An example method is provided for virtual machine data placement on a distributed storage system accessible by a duster in a virtualized computing environment. The method may comprise, based on location data relating to the cluster, identifying a first fault domain and a second fault domain of the distributed storage system. The method may further comprise selecting a first host with a first storage resource from the first fault domain and a second host with a second storage resource from the second fault domain. The method may further comprise placing a first copy of the virtual machine data on the first storage resource and a second copy of the virtual machine data on the second storage resource.

ANALYZING SYSTEM FOR MANAGING INFORMATION STORAGE TABLE AND CONTROL METHOD THEREOF

Disclosed are an analyzing system for managing an information storage table and a control method thereof. That is, each of the physical basic regions, which are units of hashing an information storage position, is divided into a plurality of physical sub regions having same size. Then, a combination of virtual basic regions which satisfy a predetermined target value is checked from all configurable combinations and a hash value related with the combination of the virtual basic regions which satisfies the checked target value is stored in a predetermined region of a memory. Therefore, even though an information storage space required for every unit region is overloaded, when there is an extra information storage space in another region, the information storage space having an extra space is used to maximize efficiency of the information storage space.

VIRTUAL MACHINE RECOVERY IN SHARED MEMORY ARCHITECTURE

Examples provide for virtual machine recovery using pooled memory. A shared partition is created on pooled memory accessible by a plurality of virtual machine hosts. A set of memory pages for virtual machines running on the hosts is moved to the shared partition. A master agent polls memory page tables associated with the plurality of hosts for write access. If the master agent obtains write access to a memory page table of a given host, the given host that previously held the write access is identified as a failed host or an isolated host. The virtual machines of the given host enabled to resume from pooled memory are respawned on a new host while maintaining memory state of the virtual machines using data within the pooled memory, including the virtual machine memory pages, memory page table, host profile data, and/or host-to-VM table data.

FINGERPRINTING OF REDUNDANT THREADS USING COMPILER-INSERTED TRANSFORMATION CODE
20170364332 · 2017-12-21 ·

A first processing element is configured to execute a first thread and one or more second processing elements are configured to execute one or more second threads that are redundant to the first thread. The first thread and the one or more second threads are to selectively bypass one or more comparisons of results of operations performed by the first thread and the one or more second threads depending on whether an event trigger for the comparison has occurred a configurable number of times since a previous comparison of previously encoded values of the results. In some cases the comparison can be performed based on hashed (or encoded) values of the results of a current operation and one or more previous operations.