Patent classifications
G06F11/2028
Defect repair for a reconfigurable data processor for homogeneous subarrays
A device architecture includes a spatially reconfigurable array of processors, such as configurable units of a CGRA, having spare homogenous subarrays, and a parameter store on the device which stores parameters that tag one or more elements as unusable. Configuration data is distributed using a statically reconfigurable bus system, to implement the pattern of placement of configuration data, in dependence on the tagged elements. As a result, a spatially reconfigurable array having unusable elements can be repaired.
Assigning identifiers to processing units in a column to repair a defective processing unit in the column
A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.
Node recovery solution for composable and disaggregated environment
In an aspect of the disclosure, a method, a computer-readable medium, and an apparatus are provided. The apparatus may be a pod manager. The pod manager receives receive a request for composing a target composed-node. The pod manager employs a first set of pooled hardware resources of the computing pod to build the target composed-node. The pod manager determines to reserve a second set of pooled hardware resources of the computing pod for a backup node of the target composed-node. The pod manager determines that the target composed-node has failed. The pod manager employs the second set of pooled hardware resources to build the backup node.
Adaptive multipath fabric for balanced performance and high availability
A computing system providing high-availability access to computing resources includes: a plurality of interfaces; a plurality of sets of computing resources, each of the sets of computing resources including a plurality of computing resources; and at least three switches, each of the switches being connected to a corresponding one of the interfaces via a host link and being connected to a corresponding one of the sets of computing resources via a plurality of resource connections, each of the switches being configured such that data traffic is distributed to remaining ones of the switches through a plurality of cross-connections between the switches if one of the switches fails.
Systems and methods for enabling a highly available managed failover service
a computing system that receives and stores configuration information for the application in a data store. The configuration information comprises (1) identifiers for a plurality of cells of the application that include at least a primary cell and a secondary cell, (2) a defined state for each of the plurality of cells, (3) one or more dependencies for the application, and (4) a failover workflow defining actions to take in a failover event. The computing system receives an indication, from a customer, of a change in state of the primary cell or a request to initiate the failover event. The computing system updates, in the data store, the states for corresponding cells of the plurality of cells based on the failover workflow and updates, in the data store, the one or more dependencies for the application based on the failover workflow.
Handling input data errors in an autonomous vehicle
Handling input data errors in an autonomous vehicle using predictive inputs, including: determining an error in input data for a model of a plurality of models of an automation system of the autonomous vehicle; generating predicted input data for the model; and generating, based on the predicted input data, output data for the model.
Vehicle system for autonomous control in response to abnormality
A vehicle system includes a first vehicle platform including a first computer configured to operate by means of electric power from a first electric power source and perform traveling control of a vehicle, a second vehicle platform including a second computer configured to operate by means of electric power from a second electric power source different from the first electric power source and perform traveling control of the vehicle, and an autonomous driving platform including a third computer configured to perform autonomous driving control of the vehicle by transmitting a control instruction including data for autonomously driving the vehicle to the first computer when the first vehicle platform is in a normal state and perform autonomous stoppage control of the vehicle by transmitting a control instruction including data for causing the vehicle to autonomously stop to the second computer when the first vehicle platform is in an abnormal state.
Data encoding, decoding and recovering method for a distributed storage system
Disclosed is a data encoding, decoding and recovering method of a distributed storage system for data protection of the distributed storage system. The methods include using local recoverable coding, and calling Reed-Solomon coding on data blocks obtained from divisions of file segments based on coding parameters to generate global coding blocks, locally coding to data blocks and global coding blocks respectively to generate local coding blocks. The methods can also include computing decoded block indices and recovered block indices according to current node state, reading block data of assistant node, and implementing decoding of file segments and recovery of failed blocks. The coding method of the present disclosure can reduce the amount of data that needs to be transmitted when recovering a failed node by increasing local coding blocks and speed up the node recovery speed.
REDUNDANT CONTROL IN A DISTRIBUTED AUTOMATION SYSTEM
A method for redundant control in a distributed automation system, preferably a real-time automation system, for operating a client device of the distributed automation system is discussed. The method includes using the client device to monitor for the occurrence of a fault in communication between the client device and a first computing infrastructure that is part of the distributed automation system and operates the client device. The method may also include using the client device, once the fault occurs, to instruct a second computing infrastructure of the distributed automation system to operate the client device.
HIGH FREQUENCY SNAPSHOT TECHNIQUE FOR IMPROVING DATA REPLICATION IN DISASTER RECOVERY ENVIRONMENT
A high frequency snapshot technique improves data replication in a disaster recovery (DR) environment. A base snapshot is generated from failover data at a primary site and replicated to a placeholder file at a secondary site. Upon commencement of the base snapshot generation and replication, incremental light weight snapshots (LWSs) of the failover data are captured and replicated to the secondary site. A staging file at the secondary site accumulates the replicated LWSs (“high-frequency snapshots”). The staging file is populated with the LWSs in parallel with the replication of the base snapshot at the placeholder file. At a subsequent predetermined time interval, the accumulated LWSs are synthesized to capture a “checkpoint” snapshot by applying and pruning the accumulated LWSs at the staging file. Once the base snapshot is fully replicated, the pruned LWSs are merged to the base snapshot to synchronize the replicated failover data.