Patent classifications
G06F11/1629
Hot updates to controller software using tool chain
Disclosed embodiments relate to performing updates to Electronic Control Unit (ECU) software while an ECU of a vehicle is operating. Operations may include receiving, at the vehicle while the ECU of the vehicle is operating, a software update file for the ECU software; writing, while the ECU is operating, the software update file into a first memory location in a memory of the ECU while simultaneously executing a code segment of existing code in a second memory location in the memory of the ECU; and updating a plurality of memory addresses associated with the memory of the ECU based on the software update file and without interrupting the execution of the code segment currently being executed in the second memory location in the memory of the ECU.
Fast Recovery for Dual Core Lock Step
An exemplary fault-tolerant computing system comprises a secondary processor configured to execute in delayed lock step with a primary processor from a common program store, comparators in the store data and writeback paths to detect a fault based on comparing primary and secondary processor states, and a writeback path delay permitting aborting execution when a fault is detected, before writeback of invalid data. The secondary processor execution and the primary processor store data and writeback may be delayed a predetermined number of cycles, permitting fault detection before writing invalid data. Store data and writeback paths may include triple module redundancy configured to pass only majority data through the store data and writeback path delay stages. Some implementations may forward data from the store data path delay stages to the writeback stage or memory if the load data address matches the address of data in a store data path delay stage.
SYSTEM AND METHOD FOR PROTECTING GPU MEMORY INSTRUCTIONS AGAINST FAULTS
A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
FAULT TOLERANCE USING SHARED MEMORY ARCHITECTURE
Examples provide a fault tolerant virtual machine (VM) using pooled memory. When fault tolerance is enabled for a VM, a primary VM is created on a first host in a server cluster. A secondary VM is created on a second host in the server cluster. Memory for the VMs is maintained on a shared partition in pooled memory. The pooled memory is accessible to all hosts in the cluster. The primary VM has read and write access to the VM memory in the pooled memory. The secondary VM has read-only access to the VM memory. If the second host fails, a new secondary VM is created on another host in the cluster. If the first host fails, the secondary VM becomes the new primary VM and a new secondary VM is created on another host in the cluster.
FINGERPRINTING OF REDUNDANT THREADS USING COMPILER-INSERTED TRANSFORMATION CODE
A first processing element is configured to execute a first thread and one or more second processing elements are configured to execute one or more second threads that are redundant to the first thread. The first thread and the one or more second threads are to selectively bypass one or more comparisons of results of operations performed by the first thread and the one or more second threads depending on whether an event trigger for the comparison has occurred a configurable number of times since a previous comparison of previously encoded values of the results. In some cases the comparison can be performed based on hashed (or encoded) values of the results of a current operation and one or more previous operations.
OPPORTUNISTIC SOFTWARE UPDATES DURING SELECT OPERATIONAL MODES
Disclosed embodiments relate to opportunistically updating Electronic Control Unit (ECU) software in a vehicle. Operations may include receiving, at a controller in a vehicle, a wireless transmission indicating a need to update software running on at least one ECU in the vehicle; monitoring an operational status of the vehicle to determine whether the vehicle is in a first mode of operation in which an ECU software update is prohibited; delaying the ECU software update when the operational status is prohibited; continuing to monitor the operational status of the vehicle to determine whether the vehicle is in a second mode of operation in which the ECU software update is permitted; and enabling updating of the at least one ECU with the delayed ECU software update when it is determined that the vehicle is in the second mode of operations.
Automated error detection and recovery for GPU computations in a service environment
A service provider system may implement ECC-like features when executing computations on GPUs that do not include sufficient error detection and recovery for computations that are sensitive to bit errors. During execution of critical computations on behalf of customers, the system may automatically instrument program instructions received from the customers to cause each computation to be executed using multiple sets of hardware resources (e.g., different host machines, processor cores, or internal hardware resources). The service may provide APIs with which customers may instrument their code for execution using redundant resource instances, or specify parameters for applying the ECC-like features. The service or customer may instrument code to perform (or cause the system to perform) checkpointing operations at particular points in the code, and to compare intermediate results produced by different hardware resources. If the intermediate results do not match, the computation may be restarted from a checkpointed state.
SOLID STATE POWER CONTROLLERS
In accordance with at least one aspect of this disclosure, a system can include a main board, two or more redundant CPUs for any number of channels operatively connected to the main board and two or more power sources operatively connected to provide power to at least one or more of the two or more CPUs. A respective isolated communications block can be operatively connected to each of the two or more CPUs configured to provide external communications from other components on the main board to a respective CPU.
Information processing system and method
An information processing system includes: a first system that includes a group of arithmetic units, a controller, and an external device; and a second system configured to execute calculation which is the same as calculation executed in the first system and compare calculation results to each other, wherein the controller is configured to: stop a plurality of arithmetic units when it is detected that an output request to the external device is output from one or more arithmetic units among the plurality of arithmetic units that execute first calculation in the group of arithmetic units, the plurality of arithmetic units including one or more arithmetic units that does not output the output request, transmit first comparison target data including a value output in response to the output request to the second system, and instruct the stopped one or more arithmetic units to execute second calculation.
Mediator assisted switchover between clusters
Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.