Patent classifications
G06F11/1497
INFORMATION PROCESSING DEVICE, CONTROL METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
An information processing device that executes an arithmetic process includes a first processing circuit and a second processing circuit. The first processing circuit executes the arithmetic process N times consecutively. The second processing circuit executes the arithmetic process N times consecutively. N is an integer of 2 or more. The first processing circuit and the second processing circuit continue to operate according to a match between at least one result among the results of the N arithmetic processes executed by the first processing circuit and at least one result among the results of the N arithmetic processes executed by the second processing circuit. As a result, it is possible to suppress an increase in cost required for hardware and to suppress a temporary stop due to a temporary failure.
FAULT DETECTION IN NEURAL NETWORKS
A method of performing fault detection during computations relating to a neural network comprising a first neural network layer and a second neural network layer in a data processing system, the method comprising: scheduling computations onto data processing resources for the execution of the first neural network layer and the second neural network layer, wherein the scheduling includes: for a given one of the first neural network layer and the second neural network layer, scheduling a respective given one of a first computation and a second computation as a non-duplicated computation, in which the given computation is at least initially scheduled to be performed only once during the execution of the given neural network layer; and for the other of the first and second neural network layers, scheduling the respective other of the first and second computations as a duplicated computation.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
An information processing device and information processing method with improved error tolerance are implemented. There is included a data processing unit that executes lockstep processing in which a plurality of processing systems executes the same task and error verification is performed by comparing execution results. In a case where an error is detected in the lockstep processing, the data processing unit increases supply voltage to a CPU circuit system that executes the task, processing of lowering a supply clock, or the like, as control for improving noise tolerance of the CPU circuit system, and moreover, performs re-execution processing of the task by using more processing systems than the processing systems before the error detection.
REDUNDANT COMMUNICATIONS FOR MULTI-CHIP SYSTEMS
An electronic device, comprising: a first component configured to transmit a first set of data to a second component by providing a first memory request specifying the first set of data for and an input memory address, and a transaction tracking unit coupled to a first transport interface, the transaction tracking unit configured to: receive the first memory request; transmit a second memory request that specifies at least a first portion of the first set of data, via the first transport interface, to the second component; receive a response to the second memory request from the second component; determine that the response corresponds to the second memory request; and provide, to the first component, an output response based on the received response to the second memory request.
Detection of faults in performance of micro instructions
Micro-architectural fault detectors are described. An example of storage mediums includes instructions for receiving one or more micro instructions for scheduling in a processor, the processor including one or more processing resources; and performing fault detection in performance of the one or more micro instructions utilizing one or more of a first idle canary detection mode, wherein the first mode includes assigning at least one component as an idle canary detector to perform a canary process with an expected outcome, and a second micro-architectural redundancy execution mode, wherein the second mode includes replicating a first micro instruction to generate micro instructions for performance by a set of processing resources.
ERROR DETECTION AND CORRECTION DEVICE AND METHOD THEREOF
An error detection and correction device and a method thereof are provided. The method for error detection and correction includes the following steps. Converting an operation program into a two's complement inverse operation program. Converting the variables of the operation program into a two's complement variables. Executing the operation program to obtain a first operation result according to variables. Executing the operation program to obtain a second operation result according to the two's complement variables. Calculating the checksum result corresponding to the variables, and calculating the checksum result corresponding to the two's complement variables. Performing at least one of comparing the first operation result with the second operation result, comparing the checksum result corresponding to the variables, and comparing the checksum result corresponding to the two's complement variables. Outputting the first operation result, the second operation result or an error message according to the above comparison results.
Buffer Checker for Task Processing Fault Detection
A graphics processing system for operation with a data store, comprising: one or more processing units for processing tasks; a check unit operable to form a signature which is characteristic of an output from processing a task on a processing unit; and a fault detection unit operable to compare signatures formed at the check unit; wherein the graphics processing system is operable to process each task first and second times at the one or more processing units so as to, respectively, generate first and second processed outputs, the graphics processing system being configured to: write out the first processed output to the data store; read back the first processed output from the data store and form at the check unit a first signature which is characteristic of the first processed output as read back from the data store; form at the check unit a second signature which is characteristic of the second processed output; compare the first and second signatures at the fault detection unit; and raise a fault signal if the first and second signatures do not match.
METRIC PAYLOAD INGESTION AND REPLAY
In one aspect, a method for metric payloads ingestion and playback is disclosed. The method includes receiving time series of metric payloads for a plurality of performance metrics indicating performance of a node or machine and storing the received time series of metric payloads in a payload tracking table of a database. The storing includes storing the received time series of metric payloads in different layers and partitioned regions of the payload tracking table. The layers represent time ranges corresponding to time points when the time series of metric payloads are received. The partitioned regions are assigned to received certain ones of the received time series of metric payloads. The method includes replaying the stored time series of metric payloads from a select one or more of the partitioned region or layer or both.
Electronic fault detection unit
An electronic fault detection unit is provided that has a first register, a second register, a comparator circuit, and a timer circuit. The first and second register can be written from a first software portion, and a second software portion, respectively. The comparator circuit is arranged to detect that both the first and second register have been written, verify a relationship between first data written to the first register and second data written to the second register, and signal a fault upon said verification failing. The timer circuit is arranged to signal a fault if said verification of the comparator circuit does not occur within a time limit.
Data quality pre-selection in read retry operations
Implementations disclosed herein provide a method comprising iteratively reading data from a failing media sector prior to a an error minimization operation, analyzing read data at each iteration using an error minimization operation to determine sections of read data as good data, storing the good data in a buffer during each iteration, and using the good data as input for the read data during a subsequent error retry operation. In another implementation, the method further comprises comparing new read data with stored good data in the buffer, and replacing the stored good data with the new read data if the new read data has a higher quality than the stored good data.