G06F11/079

SELF-MANAGING DATABASE SYSTEM USING MACHINE LEARNING

A self-managing database system includes a metrics collector to collect metrics data from one or more databases of a computing system and an anomaly detector to analyze the metrics data and detect one or more anomalies. The system includes a causal inference engine to mark one or more nodes in a knowledge representation corresponding to the metrics data for the one or more anomalies and to determine a root cause with a highest probability of causing the one or more anomalies using the knowledge representation. The system includes a self-healing engine, to take at least one remedial action for the one or more databases in response to determination of the root cause.

MOVEMENT DATA FOR FAILURE IDENTIFICATION

Configurations for data center component monitoring are disclosed. In at least one embodiment, movement of a server component is determined based on sensor data and the movement is used to diagnose a root cause for a server component failure.

METHOD AND SYSTEM TO GENERATE KNOWLEDGE GRAPH AND SUB-GRAPH CLUSTERS TO PERFORM ROOT CAUSE ANALYSIS
20230050889 · 2023-02-16 ·

Present invention discloses method and system for generating knowledge graph and sub-graph clusters to perform a root cause analysis. Method comprising extracting at least one of objects, data entities, links between the objects and the data entities, or relationships between the objects and the data entities from input content. Thereafter, method comprising generating a knowledge graph from the extracted data and sub-graphs from the knowledge graph using an unsupervised ML technique and extracting graph data structure information for each sub-graph. Subsequently, method comprising generating root cause model based on the sub-graphs and the graph data structure information and generating at least one sub-graph cluster and corresponding probabilistic graphical model using the root cause model and the knowledge graph. Generated Knowledge graph, root cause model and at least one sub-graph cluster and corresponding probabilistic graphical model are used to determine a root cause for an issue from an issue content.

Systems and methods for margin based diagnostic tools for priority preemptive schedulers

In one embodiment, a method for margin determination for a computing system with a real time operating system and priority preemptive scheduling comprises: scheduling a set of tasks to be executed in one or more partitions, wherein each is assigned a priority, wherein the tasks comprise periodic and/or aperiodic tasks; executing the set of tasks on the computing system within the scheduled periodic time window; introducing an overhead task executed for an execution duration controlled either by the real time operating system or by the overhead task; controlling the overhead task to converge on a point of failure at which a length of the execution duration of the overhead task causes either: 1) a periodic task to fail to execute within a deadline, or 2) time available for the aperiodic tasks to execute to fall below a threshold; and defining a partition margin corresponding to the point of failure.

Reporting control information errors

Methods, systems, and devices for reporting control information errors are described. A state of a memory array may be monitored during operation. After detecting an error (e.g., in received control information), the memory device may enter a first state (e.g., a locked state) and may indicate to a host device that an error was detected, the state of the memory array before the error was detected, and/or at least a portion of a control signal carrying the received control information. The host device may diagnose a cause of the error based on receiving the indication of the error and/or the copy of the control signal. After identifying and/or resolving the cause of the error, the host device may transmit one or more commands (e.g., unlocking the memory device and returning the memory array to the original state) based on receiving the original state from the memory device.

Computing system and method for automated program error repair

This application relates to a computing system and method for an automated program error repair. In one aspect, the computing system includes a storage, a preprocessing processor, and an automated error repair processor. The storage stores a program code. The preprocessing processor acquires the program code from the storage and preprocesses the program code. Preprocessing includes tokenizing the program code with tokens, converting the tokens into vectors, and adding location information for the tokens. The automated error repair processor receives the preprocessed program code as an input from the preprocessing processor, detects an error in the preprocessed program code, corrects the detected error, and outputs the error-corrected program code. Detecting and correcting the error are performed based on a deep learning result and the location information for the tokens.

Technologies for providing shared memory for accelerator sleds

Technologies for providing shared memory for accelerator sleds includes an accelerator sled to receive, with a memory controller, a memory access request from an accelerator device to access a region of memory. The request is to identify the region of memory with a logical address. Additionally, the accelerator sled is to determine from a map of logical addresses and associated physical address, the physical address associated with the region of memory. In addition, the accelerator sled is to route the memory access request to a memory device associated with the determined physical address.

VERIFYING PROCESSING LOGIC OF A GRAPHICS PROCESSING UNIT
20230043280 · 2023-02-09 ·

A method of verifying processing logic of a graphics processing unit receives a test task including a predefined set of instructions for execution on the graphics processing unit, the predefined set of instructions being configured to perform a predetermined set of operations on the graphics processing unit when executed for predefined input data. In a test phase, the test task is processed by executing the predefined set of instructions for the predefined input data first and second times at the graphics processing unit so as to, respectively, generate first and second outputs. A fault signal is raised if the first and second outputs do not match.

AUTOMATED SYSTEM AND METHOD FOR DETECTION AND REMEDIATION OF ANOMALIES IN ROBOTIC PROCESS AUTOMATION ENVIRONMENT

A method and/or system for automated detection and automated remediation of anomalies in Robotic Process Automation (RPA) environment is disclosed. The method comprises auto discovering resources (RPA components and its dependencies) in an RPA platform. The discovered resources are monitored though observation metrics whose values are obtained by executing pre-defined scripts. The obtained values are validated against threshold values to determine if there are any anomalies, wherein the threshold values may either be static values or dynamic values. If there is a breach of threshold, a remediation plan is automatically executed causing the remediation of anomalies. The system is trained to determine the dynamic threshold values through machine learning models which are developed and trained through metrics data and by determining error patterns from the historic unstructured log data.

METHOD AND APPARATUS FOR HIGH-PERFORMANCE PAGE-FAULT HANDLING FOR MULTI-TENANT SCALABLE ACCELERATORS

Apparatus and method for high-performance page fault handling. For example, one embodiment of an apparatus comprises: one or more accelerator engines to process work descriptors submitted by clients to a plurality of work queues; fault processing hardware logic associated with the one or more accelerator engines, the fault processing hardware logic to implement a specified page fault handling mode for each work queue of the plurality of work queues, the page fault handling modes including a first page fault handling mode and a second page fault handling mode.