Patent classifications
G06F11/2263
PREDICTING AND REDUCING HARDWARE RELATED OUTAGES
Disclosed here is a system to automatically predict and reduce hardware related outages. The system can obtain a performance indicator associated with a wireless telecommunication network including a system performance indicator or an application log, along with a machine learning model trained to predict and resolve a hardware error based on the performance indicator. The machine learning model can detect an anomaly associated with the performance indicator by detecting an infrequent occurrence in the performance indicator. The machine learning model can determine whether the anomaly is similar to a prior anomaly indicating a prior hardware error. Upon determining that the anomaly is similar to the prior hardware error, the machine learning model can predict an occurrence of the hardware error.
System and method for data-driven analytical redundancy relationships generation for early fault detection and isolation with limited data
Example implementations described herein involve a new data-driven analytical redundancy relationship (ARR) generation for fault detection and isolation. The proposed solution uses historical data during normal operation to extract the data-driven ARRs among sensor measurements, and then uses them for fault detection and isolation. The proposed solution thereby does not need to rely on the system model, can detect and isolate more faults than traditional data-driven methods, can work when the system is not fully observable, and does not rely on a vast amount of historical fault data, which can save on memory storage or database storage. The proposed solution can thereby be practical in many real cases where there are data limitations.
Method of constructing prediction model that predicts number of plateable substrates, method of constructing selection model for predicting component that causes failure, and method of predicting number of plateable substrates
A method of the present disclosure includes: plating a plurality of substrates using a substrate holder; determining a total number of substrates that have been plated using the substrate holder until a failure occurs in the substrate holder; determining a first processable number and a second processable number; generating a first data set constituted by a combination of first condition data and the first processable number, the first condition data representing a state of a component of the substrate holder; generating a second data set constituted by a combination of second condition data and the second processable number, the second condition data representing a state of a component of the substrate holder; and optimizing a parameter of a prediction model constituted by a neural network using training data including the first data set and the second data set.
Hardware replacement predictions verified by local diagnostics
An example of a server including a communication interface to receive telemetry data from a plurality of client devices. The telemetry data is to indicate a health of a client device from the plurality of client devices. The server further includes a prediction engine to process the telemetry data to determine the health of the client device with a prediction model to identify a hardware issue at the client device. The server also includes a diagnostic evaluator in communication with the prediction engine. The diagnostic evaluator is to request a local confirmation of the hardware issue from the client device upon identification of the hardware issue by the prediction engine. The local confirmation is determined at the client device via a diagnostic engine. The server also includes a reporter to report the hardware issue upon receipt of the local confirmation.
ANALYSIS OF MEMORY SUB-SYSTEMS BASED ON THRESHOLD DISTRIBUTIONS
Disclosed is a system comprising a memory component having a plurality of memory cells capable of being in a plurality of states, each state of the plurality of states corresponding to a value stored by the memory cell, and a processing device, operatively coupled with the memory component, to perform operations comprising: obtaining, for the plurality of memory cells, a plurality of distributions of threshold voltages, wherein each of the plurality of distributions corresponds to one of the plurality of states, classifying each of the plurality of distributions among one of a plurality of classes, generating a vector comprising a plurality of components, wherein each of the plurality of components represents the class of a respective one of the plurality of distributions, and processing, using a classifier, the generated vector to determine a likelihood that the memory component will fail within a target period of time.
FACILITATING DETECTION OF ANOMALIES IN DATA CENTER TELEMETRY
Facilitating detection of anomalies of a target entity is provided herein. A system can comprise a processor and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations. The operations can comprise training a model on a first set of variables that are constrained by a second set of variables. The second set of variables can characterize elements of a defined entity. The first set of variables can define a normality of the defined entity. The operations also can comprise employing the model to identify expected parameters and unexpected parameters associated with the defined entity to at least a defined level of confidence.
System and method for automatically monitoring and diagnosing user experience problems
The following relates generally to diagnosing problems with websites. In some embodiments, a webpage interaction processor receives a list of potential user experience problems. The webpage interaction processor then extracts click data from the website, and processes the extracted click data into grams. Subsequently, an analytics engine is trained based on the processed click data. The trained analytics engine may then diagnose the problem of the website with a potential user experience problem from the received list of potential user experience problems. In some embodiments, the process is entirely automated.
ROOT CAUSE IDENTIFICATION AND EVENT CLASSIFICATION IN SYSTEM MONITORING
An event graph schema for a technology landscape may be determined, where the technology landscape is characterized using scores assigned to performance metrics. The event graph schema may include a plurality of nodes corresponding to the performance metrics and the scores, and directional edges connecting node pairs of the plurality of nodes, with each directional edge having a score-dependent validity criteria defined by scores of a corresponding node pair. Anomalous scores associated with an event within the technology landscape may be used to find anomalous nodes. Valid edges connecting two of the anomalous nodes and satisfying the score-dependent validity criteria thereof may be used to determine at least one path that includes the valid edges and connected anomalous nodes. In this way, it is possible to traverse the at least one path to identify at least one of the connected anomalous nodes as a root cause node of the event.
Maze-driven self-diagnostics using reinforcement learning
Systems and methods are provided for automatedly troubleshooting a computing application (e.g., a cloud-based computing application). An application domain of the computing application is modeled as a two-dimensional array of cells, a first dimension of the array representing components or microservices of the application domain, and a second dimension of the array representing states of the components or microservices, the array including paths between pairs of cells in the array. A troubleshooting goal is defined as a target state of the application domain, the target state corresponding to a target cell in the array. An initial state of the application domain is also provided, the initial state corresponding to an initial cell in the array. A reinforcement-learning-trained machine-learning algorithm can determine a solution path in the array between the initial cell and the target cell. Divergence between a failure case and a solution path indicates a probable failure cause.
Machine learning-powered resolution resource service for HCI systems
A method of processing requests from users of a computer system having software and hardware components, wherein each request includes a description of an software or hardware issue, includes: upon receiving a first request, performing content matching using a first instance of the data model to determine if the issue described in the first request has been previously encountered; executing a machine learning algorithm against a new data set to update a second instance of the data model; after the second instance of the data model has been updated, switching the data model used for performing content matching from the first to the second instance of the data model; and upon receiving a second request after the switch, performing content matching using the second instance of the data model to determine if the issue described in the second request has been previously encountered.