G06F11/2043

BUNDLING OF WIRED AND WIRELESS INTERFACES IN A REDUNDANT INTERFACE OF A HIGH-AVAILABILITY CLUSTER
20210160140 · 2021-05-27 ·

A system may include a first node in a high-availability cluster; a second node in the high-availability cluster; a redundant interface between a network device and both the first node and the second node, wherein the redundant interface is associated with a redundancy group that designates one of the first node or the second node as a primary node in the high-availability cluster and that designates the other of the first node or the second node as a backup node in the high-availability cluster; a wireless interface of the first node, wherein the wireless interface is included in the redundant interface; and a wired interface of the second node, wherein the wired interface is included in the redundant interface.

Asymmetric coherency protocol for first and second processing circuitry having different levels of fault protection or fault detection
10997076 · 2021-05-04 · ·

An apparatus has first processing circuitry and second processing circuity. The second processing circuitry has at least one hardware mechanism providing a greater level of fault protection or fault detection than is provided for the first processing circuitry. Coherency control circuitry controls access to data from at least part of a shared address space by the first and second processing circuitry according to an asymmetric coherency protocol in which a local-only update of data in a local cache of the first processing circuitry is restricted in comparison to a local-only update of data in a local cache of the second processing circuitry.

System and method for reducing failover times in a redundant management module configuration

While the management module of an information handling system is set as a standby module, an enclosure controller provides first requests for attribute data of the information handling system, and receives and stores first response data for attribute data associated with a first subset of the first requests in a local memory of the enclosure controller. The enclosure controller receives request failure responses associated with a second subset of the first requests directed to a subset of the attributes data for the information handling system stored in a shared memory. While the management module is set as an active module, the management module is granted access to the shared memory. The enclosure controller provides retry requests for attributes associated with the request failure responses, and receives and stores second response data associated with the retry requests in the local memory.

FAULT STATE TRANSITIONS IN AN AUTONOMOUS VEHICLE
20210133057 · 2021-05-06 ·

Fault state transitions in an autonomous vehicle may include determining that a first node of a plurality of nodes has failed; determining, in response to the first node failing, a failure state; determining, based on the failure state, a configuration for the plurality of nodes excluding the first node; and applying the configuration.

Computer or microchip with a secure system bios having a separate private network connection to a separate private network
10965645 · 2021-03-30 ·

A method for a computer or microchip with one or more inner hardware-based access barriers or firewalls that establish one or more private units disconnected from a public unit or units having connection to the public Internet and one or more of the private units have a connection to one or more non-Internet-connected private networks for private network control of the configuration of the computer or microchip using active hardware configuration, including field programmable gate arrays (FPGA). The hardware-based access barriers include a single out-only bus and/or another in-only bus with a single on/off switch.

Component redundancy systems, devices, and methods

Discussed herein are component redundancy systems, devices, and methods. A method to transfer a workload from a first component to a second component of a same device may include monitoring a wear indicator associated with the first component, and in response to an indication that the first component is stressed based on the wear indicator, transferring a workload of the first component to the second component.

Distributed database system and resource management method for distributed database system

The data processing times of data processing nodes are heterogeneous, and hence the execution time of a whole system is not optimized. A task is executed using a plurality of optimal computing devices by distributing a data amount of data to be processed with a processing command of the task for the plurality of optimal computing devices depending on a difference in computing power between the plurality of optimal computing devices, to thereby execute the task in a distributed manner using the plurality of optimal computing devices.

Programming model and framework for providing resilient parallel tasks

Exemplary embodiments herein describe programming models and frameworks for providing parallel and resilient tasks. Tasks are created in accordance with predetermined structures. Defined tasks are stored as data objects in a shared pool of memory that is made up of disaggregated memory communicatively coupled via a high performance interconnect that supports atomic operations as descried herein. Heterogeneous compute nodes are configured to execute tasks stored in the shared memory. When compute nodes fail, they do not impact the shared memory, the tasks or other data stored in the shared memory, or the other non-failing compute nodes. The non-failing compute nodes can take on the responsibility of executing tasks owned by other compute nodes, including tasks of a compute node that fails, without needing a centralized manager or schedule to re-assign those tasks. Task processing can therefore be performed in parallel and without impact from node failures.

Role management of compute nodes in distributed clusters
10922199 · 2021-02-16 · ·

In one example, a distributed cluster may include compute nodes having a master node and a replica node, an in-memory data grid formed from memory associated with the compute nodes, a first high availability agent running on the replica node, and a second high availability agent running on the master node. The first high availability agent may determine a failure of the master node by accessing data in the in-memory data grid and designate a role of the replica node as a new master node to perform cluster management tasks of the master node. The second high availability agent may determine that the new master node is available in the distributed cluster by accessing the data in the in-memory data grid when the master node is restored after the failure and demote a role of the master node to a new replica node.

CPU HOT-SWAPPING

There is disclosed in one example a multi-core computing system configured to provide a hot-swappable CPU0, including: a first CPU in a first CPU socket and a second CPU in a second CPU socket; a switch including a first media interface to the first CPU socket and a second media interface to the second CPU socket; and one or more mediums including non-transitory instructions to detect a hot swap event of the first CPU, designate the second CPU as CPU0, determine that a new CPU has replaced the first CPU, operate the switch to communicatively couple the new CPU to a backup initialization code store via the first media interface, initialize the new CPU, and designate the new CPU as CPUN, wherein N0.