G06F11/16

Methods and apparatus for automatic database failover in a master-replica replication configuration

Methods, apparatus, systems and articles of manufacture for automatic database failover in a master-replica replication configuration are disclosed. An example node within a database system having a plurality of nodes, includes an agent to select a first database operated at one of the plurality of nodes to function as a master database in a master-replica replication configuration. The agent is to cause the selected database to be configured as the master database. The agent is to configure a first reverse proxy of the node. The first reverse proxy is to receive a query from a load balancer and forward the query to the master database based on the configuration.

Encoding Data and Associated Metadata in a Storage Network

A storage network operates by: generating metadata for a data object; first disperse storage error encoding the metadata to produce a set of metadata slices, wherein the first disperse storage error encoding utilizes first dispersal parameters, the first dispersal parameters including a first decode threshold of 1; generating sets of first data slices via a second disperse storage error encoding of data segments associated with the data object, wherein the second disperse storage error encoding utilizes second dispersal parameters, the second dispersal parameters different from the first dispersal parameters and the second dispersal parameters including a second decode threshold greater than 1; producing an additional data segment associated with the data object wherein the additional data segment is different from the data segments and the metadata; and third disperse storage error encoding the additional data segment to produce a set of second data slices, wherein the third disperse storage error encoding utilizes the first dispersal parameters including the first decode threshold of 1.

Efficient memory utilisation in a processing cluster having a split mode and a lock mode
11221899 · 2022-01-11 · ·

An apparatus is described comprising a cluster of processing elements. The cluster having a split mode in which the processing elements are configured to process independent processing workloads, and a lock mode in which the processing elements comprise at least one primary processing element and at least one redundant processing element, each redundant processing element configured to perform a redundant processing workload for checking correctness of a primary processing workload performed by the primary processing element. Each processing element has an associated local memory comprising a plurality of memory locations. A local memory access control mechanism is configured, during the lock mode, to allow the at least one primary processing element to access memory locations within the local memory associated with the at least one redundant processing element.

Efficient memory utilisation in a processing cluster having a split mode and a lock mode
11221899 · 2022-01-11 · ·

An apparatus is described comprising a cluster of processing elements. The cluster having a split mode in which the processing elements are configured to process independent processing workloads, and a lock mode in which the processing elements comprise at least one primary processing element and at least one redundant processing element, each redundant processing element configured to perform a redundant processing workload for checking correctness of a primary processing workload performed by the primary processing element. Each processing element has an associated local memory comprising a plurality of memory locations. A local memory access control mechanism is configured, during the lock mode, to allow the at least one primary processing element to access memory locations within the local memory associated with the at least one redundant processing element.

Electronic Control Device and Parallel Processing Method

An electronic control device includes a first processing control unit and a second processing control unit, wherein: the first processing control unit and the second processing control unit alternately start arithmetic processing with information of an external environment as an input; and second arithmetic processing of the second processing control unit is started after first arithmetic processing by the first processing control unit is started and before the first arithmetic processing is ended.

SYSTEMS, METHODS, AND DEVICES FOR DATA RECOVERY USING PARITY SPACE AS RECOVERY SPACE
20210349781 · 2021-11-11 ·

A method may include operating a first storage device and a second storage device as a redundant array configured to use parity information to recover information from a faulty storage device, operating the first storage device in a fault resilient mode with at least partial read capability based on a fault condition of the first storage device, and rebuilding information from the first storage device in a parity space of the second storage device. Rebuilding the information from the first storage device in the parity space of the second storage device may include copying the information from the first storage device to the parity space of the second storage device. The method may further include copying the rebuilt information from the parity space of the second storage device to a replacement storage device.

FENCING NON-RESPONDING PORTS IN A NETWORK FABRIC
20210349794 · 2021-11-11 ·

A computer-implemented method according to one aspect includes determining whether an operating system of a node of a distributed computing environment is functioning correctly by sending a first management query to the node; in response to determining that the operating system of the node is not functioning correctly, determining whether the node has an active communication link by sending a second management query to ports associated with the node; and in response to determining that the node has an active communication link, resetting the active communication link for the node by sending a reset request to the ports associated with the node.

SYSTEMS, METHODS, AND DEVICES FOR DATA RECOVERY WITH SPARE STORAGE DEVICE AND FAULT RESILIENT STORAGE DEVICE
20210349780 · 2021-11-11 ·

A method may include operating a first storage device and a second storage device as a redundant array, operating the first storage device in a fault resilient mode with at least partial read capability based on a fault condition of the first storage device, and rebuilding information from the first storage device on a spare storage device based on the fault condition of the first storage device. Rebuilding information from the first storage device on the spare storage device may include copying information from the first storage device to the spare storage device. The information from the first storage device may include data and/or parity information. The method may further include reading first information for a read or write operation from the first storage device based on a rebuild point of the spare storage device.

Data storage system with metadata check-pointing

A data storage system includes multiple head nodes and data storage sleds. Volume data is replicated between a primary and one or more secondary head nodes for a volume partition and is further flushed to a set of mass storage devices of the data storage sleds. Volume metadata is maintained in a primary and one or more secondary head nodes for a volume partition and is updated in response to volume data being flushed to the data storage sleds. Also, the primary and secondary head nodes store check-points of volume metadata to the data storage sleds, wherein in response to a failure of a primary or secondary head node for a volume partition, a replacement secondary head node for the volume partition recreates a secondary replica for the volume partition based, at least in part, on a stored volume metadata checkpoint.

Method of using a single controller (ECU) for a fault-tolerant/fail-operational self-driving system

In a self-driving autonomous vehicle, a controller architecture includes multiple processors within the same box. Each processor monitors the others and takes appropriate safe action when needed. Some processors may run dormant or low priority redundant functions that become active when another processor is detected to have failed. The processors are independently powered and independently execute redundant algorithms from sensor data processing to actuation commands using different hardware capabilities (GPUs, processing cores, different input signals, etc.). Intentional hardware and software diversity improves fault tolerance. The resulting fault-tolerant/fail-operational system meets ISO26262 ASIL D specifications based on a single electronic controller unit platform that can be used for self-driving vehicles.