G06F11/0724

PROCESSOR SYSTEM AND FAULT DETECTION METHOD THEREOF

Provided is a processor system including a first processor driven by a first driving voltage and a first driving clock, a second processor driven by a second driving voltage and a second driving clock and configured to perform an identical task to the first processor, and a defect detector configured to perform level synchronization or clock domain synchronization on a first output signal provided from the first processor and a second output signal provided from the second processor to compare the first and second output signals, wherein the first and second driving voltages are respectively provided from mutually independent power supply sources and the first and second driving clocks are respectively provided from mutually independent clock generators.

Information processing device, method, and computer readable medium

An information processing device includes a plurality of processors. One of the plurality of processors is configured to: acquire a first code from a plurality of codes included in a program to be emulated; determine whether or not that a plurality of native codes corresponding to the acquired first code are for the atomic operation and that the first code accesses a memory; and generate a modified native code string with delay in which a certain code for delaying the completion of executing the plurality of native codes is inserted in the plurality of native codes corresponding to the first code when the one of the plurality of processors determined that the plurality of native codes corresponding to the first code are not for the atomic operation and that the first codes accesses the memory.

SERIALIZING MACHINE CHECK EXCEPTIONS FOR PREDICTIVE FAILURE ANALYSIS
20220229714 · 2022-07-21 ·

Upon occurrence of multiple errors in a central processing unit (CPU) package, data indicating the errors is stored in machine check (MC) banks. A timestamp corresponding to each error is stored, the timestamp indicating a time of occurrence for each error. A machine check exception (MCE) handler is generated to address the errors based on the timestamps. The timestamps can be stored in the MC banks or in a utility box (U-box). The MCE handler can then address the errors based on order of occurrence, for example by determining that the first error in time causes the remaining error. The MCE can isolate hardware/software associated with the first error to recover from a failure. The MCE can report only the first error to the operating system (OS) or other error management software/hardware. The U-Box may also convert the timestamps into real time to support user debugging.

Processor Repair
20210406115 · 2021-12-30 ·

A processor comprises a plurality of processing units, wherein there is a fixed transmission time for transmitting a message from a sending processing unit to a receiving processing unit, based on the physical positions of the sending and receiving processing units in the processor. The processing units are arranged in a column, and the fixed transmission time depends on the position of a processing circuit in the column. An exchange fabric is provided for exchanging messages between sending and receiving processing units, the columns being arranged with respect to the exchange fabric such that the fixed transmission time depends on the distances of the processing circuits with respect to the exchange fabric.

CROSS-COMPONENT HEALTH MONITORING AND IMPROVED REPAIR FOR SELF-HEALING PLATFORMS
20210397507 · 2021-12-23 · ·

Systems, apparatuses and methods may provide for technology that detects a successful boot of a first firmware component in a computing system, receives a signal from a second firmware component in the computing system, and detects an incompatibility of the first firmware component with respect to the second firmware component based on the signal. In one example, only the first firmware component is repaired in response to the incompatibility.

METHOD AND SYSTEM FOR FAULT COLLECTION AND REACTION IN SYSTEM-ON-CHIP
20210397502 · 2021-12-23 ·

A fault collection and reaction system on a system-on-chip (SoC) includes a plurality of reaction cores assigned to a plurality of applications being executed by a plurality of processor cores on the SoC, at least one look-up table (LUT), and a controller. The at least one LUT stores therein a first mapping between the plurality of reaction cores and corresponding plurality of domain identifiers, and a second mapping between a plurality of faults and a set of reaction combinations. The controller receives a fault indication and a first domain identifier in response to occurrence of a first fault and selects from the plurality of reaction cores, a first reaction core mapped to the first domain identifier, and from the set of reaction combinations, a first reaction combination mapped to the first fault. The first reaction core responds to the fault indication with a reaction based on the selected reaction combination.

Reprogrammable quantum processor architecture incorporating quantum error correction

A novel and useful quantum computing machine architecture that includes a classic computing core as well as a quantum computing core. A programmable pattern generator executes sequences of instructions that control the quantum core. In accordance with the sequences, a pulse generator functions to generate the control signals that are input to the quantum core to perform quantum operations. A partial readout of the quantum state in the quantum core is generated that is subsequently re-injected back into the quantum core to extend decoherence time. Access gates control movement of quantum particles in the quantum core. Errors are corrected from the partial readout before being re-injected back into the quantum core. Internal and external calibration loops calculate error syndromes and calibrate the control pulses input to the quantum core. Control of the quantum core is provided from an external support unit via the pattern generator or can be retrieved from classic memory where sequences of commands for the quantum core are stored a priori in the memory. A cryostat unit functions to provide several temperatures to the quantum machine including a temperature to cool the quantum computing core to approximately 4 Kelvin.

METHOD FOR REGULATING CHIP TEMPERATURE
20210382803 · 2021-12-09 ·

A method for regulating a chip temperature, belong to the technical field of integrated chips, and applied to multi-core processor chips. The method comprises: pre-setting a threshold set consisting of multiple thresholds gradated by magnitude; and further comprises: step S1, detecting, in real time, a real-time temperature produced by a processor chip; step S2, judging whether the real-time temperature has exceeded one or more thresholds in the threshold set; and step S3, using a pre-set strategy to shut down a pre-set rule number of cores corresponding to the number of thresholds exceeded by the real-time temperature. The method for regulating a chip temperature can deal with the scenario where a chip temperature is higher than a threshold due to continuous load increase, and has high reliability.

CONTROL DEVICE, NON-TRANSITORY STORAGE MEDIUM, AND SYSTEM

To manage a series of processes performed by a plurality of devices more efficiently.

There is provided a control device comprising: a control unit configured to transmit an execution instruction to execute a prescribed process which is a process prescribed in advance to at least one of processing devices and cause the plurality of processing devices including the processing device to execute the prescribed process, wherein, when the control unit detects that an error has occurred in execution of the prescribed process executed by one of the plurality of processing devices, the control unit executes control such that an interruption instruction to interrupt the execution of the prescribed process reaches at least one of the processing devices different from the processing device in which the error has occurred.

METHOD AND SYSTEM FOR DETECTING GPU-RELATED FACTORS OF MULTI-MODE DISTRIBUTED CLUSTER

A method for detecting comprehensive GPU-related factors of a distributed cluster, the method including: (1): checking whether there is a configuration file content of an operating node; (2): reading a mode parameter in an environment variable of the operating node, and correspondingly switching an operating mode according to the mode parameter; (3): reading a timer frequency value from the environment variable of the operating node so as to set a time period for reading a GPU information parameter according to the timer frequency value; (4): calculating the maximum value of the GPU information parameter of the operating node, and storing the maximum value into the GPU information list cache; and (5): initializing the transmitted information; determining whether there is a GPU in the GPU information list cache of the operating node.