Patent classifications
G06F11/1629
Workload Repetition Redundancy
A graphics processing system includes a plurality of processing units for processing tasks, each processing unit being configured to process a task independently from any other processing unit of the plurality of processing units; a check unit operable to form a signature which is characteristic of an output of a processing unit on processing a task; and a fault detection unit operable to compare signatures formed at the check unit; wherein the graphics processing system is configured to process each task of a first type first and second times at the plurality of processing units so as to, respectively, generate first and second processed outputs, wherein the check unit is configured to form first and second signatures which are characteristic of, respectively, the first and second processed outputs, and wherein the fault detection unit is configured to compare the first and second signatures and raise a fault signal if the first and second signatures do not match.
MEDIATOR ASSISTED SWITCHOVER BETWEEN CLUSTERS
Techniques are provided for metadata management for enabling automated switchover. An initial quorum vote may be performed before a node executes an operation associated with metadata comprising operational information and switchover information. After the initial quorum vote is performed, the node executes the operation upon one or more mailbox storage devices. Once the operation has executed, a final quorum vote is performed. The final quorum vote and the initial quorum vote are compared to determine whether the operation is to be designated as successful or failed, and whether any additional actions are to be performed.
Apparatuses, methods, and systems for hardware-assisted lockstep of processor cores
Systems, methods, and apparatuses relating to circuitry to implement lockstep of processor cores are described. In one embodiment, a hardware processor comprises a first processor core comprising a first control flow signature register and a first execution circuit, a second processor core comprising a second control flow signature register and a second execution circuit, and at least one signature circuit to perform a first state history compression operation on a first instruction that executes on the first execution circuit of the first processor core to produce a first result, store the first result in the first control flow signature register, perform a second state history compression operation on a second instruction that executes on the second execution circuit of the second processor core to produce a second result, and store the second result in the second control flow signature register.
System and method for protecting GPU memory instructions against faults
A system and method for protecting memory instructions against faults are described. The system and method include converting the slave instructions to dummy operations, modifying memory arbiter to issue up to N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system, using slave requests for error checking, entering master requests to the GM/LM FIFO, storing slave requests in a register, and comparing the entered master requests with the stored slave requests.
Recovery from a clustered file system queue failure event using a modified extended attribute of a file
A computer-implemented method according to one embodiment includes, in response to a determination that a predetermined operation has been performed on an object of a first file stored on a first cluster site, storing predetermined information about the object of the first file stored on the first cluster site. The predetermined information is stored on an extended attribute of the first file stored on the first cluster site. In response to a determination that the predetermined operation is performed on an object of a first file stored on a second cluster site, the predetermined information is removed from the extended attribute of the first file stored on the first cluster site. In response to a determination that a failure event has occurred on a queue of the first cluster site, a predetermined recovery process is performed, thereby enabling fulfillment of entries of the queue of the first cluster site.
Self-healing learning system for one or more controllers
Disclosed embodiments relate to automatically providing updates to at least one vehicle. Operations may include receiving, at a server remote from the at least one vehicle, Electronic Control Unit (ECU) activity data from the at least one vehicle, the ECU activity data corresponding to actual operation of the ECU in the at least one vehicle; determining, at the server and based on the ECU activity data, a software vulnerability affecting the at least one vehicle, the software vulnerability being determined based on a deviation between the received ECU activity data and expected ECU activity data; identifying, at the server, an ECU software update based on the determined software vulnerability; and sending, from the server, a delta file configured to update software on the ECU with a software update corresponding to the identified ECU software update.
DETECTING ANOMALIES USING REAL-TIME CONTROLLER PROCESSING ACTIVITY
Disclosed embodiments relate to performing updates to Electronic Control Unit (ECU) software while an ECU of a vehicle is operating. Operations may include receiving, at the vehicle while the ECU of the vehicle is operating, a software update file for the ECU software; writing, while the ECU is operating, the software update file into a first memory location in a memory of the ECU while simultaneously executing a code segment of existing code in a second memory location in the memory of the ECU; and updating a plurality of memory addresses associated with the memory of the ECU based on the software update file and without interrupting the execution of the code segment currently being executed in the second memory location in the memory of the ECU.
Apparatus and method for communications in a safety critical system
A safety communication scheme for a safety-critical system which includes two or more higher level units that have voting capabilities and one or two sets of lower level units that do not have voting capabilities, involves using one channel between the high and low level units for safety and two channels for redundancy.
Program Generation Apparatus and Parallel Arithmetic Device
A program for causing a parallel arithmetic device including a plurality of arithmetic groups to execute parallel arithmetic is input. The program includes information defining each of the following: application arithmetic constituting predetermined processing; redundant arithmetic (which is redundant arithmetic of the application arithmetic and is arithmetic assigned to a surplus core(s) in a diagnosis target arithmetic group); and diagnostic arithmetic (arithmetic that is a comparison of results of the same redundant arithmetic by two or more diagnosis target arithmetic groups and is assigned to surplus cores in an arithmetic group for diagnosis). The surplus core(s) is a core(s) to which no application arithmetic is assigned.
Efficient memory utilisation in a processing cluster having a split mode and a lock mode
An apparatus is described comprising a cluster of processing elements. The cluster having a split mode in which the processing elements are configured to process independent processing workloads, and a lock mode in which the processing elements comprise at least one primary processing element and at least one redundant processing element, each redundant processing element configured to perform a redundant processing workload for checking correctness of a primary processing workload performed by the primary processing element. Each processing element has an associated local memory comprising a plurality of memory locations. A local memory access control mechanism is configured, during the lock mode, to allow the at least one primary processing element to access memory locations within the local memory associated with the at least one redundant processing element.