G06F11/0721

Data processing system and method for acquiring data for training a machine learning model for use in monitoring the data processing system for anomalies
11586983 · 2023-02-21 · ·

A data processing system and a method are provided for acquiring data for training a machine learning (ML) model for use in self-monitoring the data processing system. The data processing system operates in a data acquisition mode to acquire training data for training the ML model. The training data is acquired from an anomaly detector of the data processing system while operating in the data acquisition mode. At least a portion of the training data is determined to be biased, and a portion of the training data is unbiased. The unbiased portion of the training data is transferred to a training environment external to the data processing system. The unbiased portion of the training data is acquired for training the ML model to function with the anomaly detector during a normal operating mode to determine when an anomaly is present in the data processing system.

System and method for determining error occurrence in graphics memory of graphics processing unit
11500711 · 2022-11-15 · ·

A system may include a graphics processing unit (GPU) and a processor. The GPU may include a GPU core and non-error-detection-and-correction (non-EDAC) graphics memory. The graphics memory may contain a data object and a copy of the data object. The processor may be configured to: instruct the GPU to handle the data object and the copy of the data object as textures; and instruct the GPU to execute a texture comparison shader program. The GPU core may be configured to: execute the texture comparison shader program; compare the data object and the copy of the data object; generate comparison results; and output the comparison results as pixels to an off-screen area of a framebuffer. The processor may further be configured to: obtain (a) a hash value of the off-screen area, or (b) the off-screen area; and determine whether the comparison results are at least one expected value.

Processor core debugging with processor virtualization

A device, such as a system on a chip (SoC), includes a plurality of processor cores, a broadcaster module, a plurality of decoder units, and an aggregator module. The broadcaster module broadcasts a debug request from a debugger device to one or more of the plurality of processor cores via a bus, the debug request including an address specifying a logical identifier associated with a target processor core of the plurality of processor cores. The decoder units, associated with the processor cores, forward the debug request to a debug module of the respective processor core in response to detecting a match. If no match is detected, the decoder units forward the debug request to a subsequent processor core via the bus. The aggregator module forward a response message to the debugger device, the response message originating from the target processor core.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
20220358026 · 2022-11-10 ·

An information processing device and information processing method with improved error tolerance are implemented. There is included a data processing unit that executes lockstep processing in which a plurality of processing systems executes the same task and error verification is performed by comparing execution results. In a case where an error is detected in the lockstep processing, the data processing unit increases supply voltage to a CPU circuit system that executes the task, processing of lowering a supply clock, or the like, as control for improving noise tolerance of the CPU circuit system, and moreover, performs re-execution processing of the task by using more processing systems than the processing systems before the error detection.

Streaming engine with deferred exception reporting

This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence is specified by a control register. The streaming engine fetches stream data ahead of use by a central processing unit and stores it in a stream buffer. Upon occurrence of a fault reading data from memory, the streaming engine identifies the data element triggering the fault preferably storing this address in a fault address register. The streaming engine defers signaling the fault to the central processing unit until this data element is used as an operand. If the data element is never used by the central processing unit, the streaming engine never signals the fault. The streaming engine preferably stores data identifying the fault in a fault source register. The fault address register and the fault source register are preferably extended control registers accessible only via a debugger.

Multi-Threaded Secure Processor with Control Flow Attack Detection

A fault detecting multi-thread pipeline processor with fault detection is operative with a single pipeline stage which generates branch status comprising at least one of branch taken/not_taken, branch direction, and branch target. A first thread has control and data instructions, the control instructions comprising loop instructions including unconditional and conditional branch instructions, loop initialization instructions, loop arithmetic instructions, and no operation (NOP) instructions. A second thread has only control instructions and either has the non-control instructions replaced with NOP instructions, or removed entirely. A fault detector compares the branch status of the first thread and second thread and asserts a fault output when they do not match.

MONITORING EXECUTION OF APPLICATION SCHEDULES IN COMPUTING SYSTEMS

One or more embodiments of the present disclosure relate to monitoring execution of runnables that may be executed by a computing system, the executing begin based at least on a schedule. The monitoring may include one or more of: monitoring timing of execution of the runnables, monitoring one or more sequences of execution of the runnables, or monitoring health of at least a portion of the computing system executing the runnables. Additionally or alternatively, one or more embodiments may relate to determining compliance with respect to one or more execution constraints based at least in part on the monitoring.

Method and system for variable level of logging based on (long term steady state) system error equilibrium

In general, embodiments of the invention relate to a method for enabling enhanced logging. The method includes obtaining historical data for a target entity, determining a steady state error probability of the target entity using the historical data, and enabling, based on the steady state error probability, a first level of enhanced logging on the target entity.

Detection of system anomalies
11614992 · 2023-03-28 · ·

Systems and methods are provided for detecting system anomalies. The described technique includes receiving system parameters specifying functionality of a computing system. An anomaly is detected within the computing system. A recovery method is determined based on a recovery-method model and information about the detected anomaly, responsive to detecting the anomaly in the computing system. The determined recovery method is configured to ensure requirements of the computing system are met. Furthermore, responsive to detecting the anomaly in the computing system, the determined recovery method is implemented in response to installation of the selected system-compatible tool.

AUTOMATIC SENSOR TRACE VALIDATION USING MACHINE LEARNING

The disclosure provides a computer-implemented method for detecting a failure of a device, wherein the device is connected to a sensor, the method comprising: receiving, by a machine learning model, a trace signal from the sensor indicating a status of the device; encoding, by the machine learning model, the trace signal into a plurality of vector representations; and determining, by the machine learning model, whether the trace signal is valid or invalid based on the plurality of vector representations.