Patent classifications
G06F11/2263
SYSTEM AND METHOD FOR AUTOMATING FAULT DETECTION IN MULTI-TENANT ENVIRONMENTS
A system and method of automating fault prediction and remediation for a multi-tenant system is disclosed. The system and method offer an intelligent augmentation of a multi-tenant system by automating the harvesting and processing of raw data logs generated by the various aspects of the platform as well as the implementation of an appropriate response. In some embodiments, the proposed system includes a hybrid model that can be configured to offer both assisted and unassisted errors. The incorporation of a dynamic learning algorithm minimizes operation errors for any set of computing units. Potential system faults can be detected with little to no human intervention and allow for an unattended platform that collects performance data across the system from a wide range of sources to provide centralized and automated fault prediction, as well as expedited automated resolutions to such faults that depend on little to no human intervention.
SYSTEMS AND METHODS FOR DISTRIBUTED INCIDENT CLASSIFICATION AND ROUTING
Aspects of the present disclosure relate to incident routing in a cloud environment. In an example, cloud provider teams utilize a scout framework to build a team-specific scout based on that team's expertise. In examples, an incident is detected and a description is sent to each team-specific scout. Each team-specific scout uses the incident description and the scout specifications provided by the team to identify, access, and process monitoring data from cloud components relevant to the incident. Each team-specific scout utilizes one or more machine learning models to evaluate the monitoring data and generate an incident-classification prediction about whether the team is responsible for resolving the incident. In examples, a scout master receives predictions from each of the team-specific scouts and compares the predictions to determine to which team an incident should be routed.
Method for detecting repair-necessary motherboards and device using the method
A method for detecting repairable boards requiring repair amongst many boards which may or may not require repair applies a board detection model based on training features of many sample repairable boards. The method obtains repair-relevant information of all the sample repairable boards, extracts predetermined features from the repair-relevant information, and analyzes the predetermined features to obtain the training features. The board detection model is established and trained based on the training features, and receives repair-relevant information of each repairable board to obtain a result of detection repairable board according to the board detection model. A device for detecting repairable boards is also provided.
HARDWARE REPLACEMENT PREDICTIONS VERIFIED BY LOCAL DIAGNOSTICS
An example of a server including a communication interface to receive telemetry data from a plurality of client devices. The telemetry data is to indicate a health of a client device from the plurality of client devices. The server further includes a prediction engine to process the telemetry data to determine the health of the client device with a prediction model to identify a hardware issue at the client device. The server also includes a diagnostic evaluator in communication with the prediction engine. The diagnostic evaluator is to request a local confirmation of the hardware issue from the client device upon identification of the hardware issue by the prediction engine. The local confirmation is determined at the client device via a diagnostic engine. The server also includes a reporter to report the hardware issue upon receipt of the local confirmation.
ANALYSIS OF MEMORY SUB-SYSTEMS BASED ON THRESHOLD DISTRIBUTIONS
Disclosed is a system comprising a memory component having a plurality of memory cells capable of being in a plurality of states, each state of the plurality of states corresponding to a value stored by the memory cell, and a processing device, operatively coupled with the memory component, to perform operations comprising: obtaining, for the plurality of memory cells, a plurality of distributions of threshold voltages, wherein each of the plurality of distributions corresponds to one of the plurality of states, classifying each of the plurality of distributions among one of a plurality of classes, generating a vector comprising a plurality of components, wherein each of the plurality of components represents the class of a respective one of the plurality of distributions, and processing, using a classifier, the generated vector to determine a likelihood that the memory component will fail within a target period of time.
DATA PROCESSING SYSTEM AND METHOD FOR ACQUIRING DATA FOR TRAINING A MACHINE LEARNING MODEL FOR USE IN MONITORING THE DATA PROCESSING SYSTEM FOR ANOMALIES
A data processing system and a method are provided for acquiring data for training a machine learning (ML) model for use in self-monitoring the data processing system. The data processing system operates in a data acquisition mode to acquire training data for training the ML model. The training data is acquired from an anomaly detector of the data processing system while operating in the data acquisition mode. At least a portion of the training data is determined to be biased, and a portion of the training data is unbiased. The unbiased portion of the training data is transferred to a training environment external to the data processing system. The unbiased portion of the training data is acquired for training the ML model to function with the anomaly detector during a normal operating mode to determine when an anomaly is present in the data processing system.
Machine Defect Prediction Based on a Signature
Methods, system, and computer readable medium are presented for predicting defects using a machine learning component based on a generated signature. A trained machine learning component that has been trained with historic data that represents a series of events that occurred within a plurality of heterogeneous systems over a plurality of periods of change for the heterogeneous systems can be received. A base signature for a first heterogeneous system that includes a first mix of modules can be compared to a current signature for the first heterogeneous system to identify one or more irregularities. The trained machine learning component can predict one or more defects for the first heterogeneous system based on the identified irregularity.
FAULT INDICATOR DIAGNOSTIC SYSTEM AND FAULT INDICATOR DIAGNOSTIC METHOD
A fault indicator diagnostic system and fault indicator diagnostic method, with which a fault indicator of a machine can be more accurately diagnosed, has an operation sensor data table which indicates an association between sensor data and an acquisition time of the sensor data. An operation mode data table indicates an association between an operation mode and a time of operation in the operation mode. An operation data table is created by merge processing the operation sensor data table and the operation mode data table comprising the sensor data with regard to the operation mode at a given time. The system compares, in a given operation mode, a threshold determined on the basis of a diagnostic model created by learning from normal sensor data with a value computed on the basis of the diagnostic model from the sensor data to be diagnosed, and determines whether a malfunction is occurring.
A UNIFYING SEMI-SUPERVISED APPROACH FOR MACHINE CONDITION MONITORING AND FAULT DIAGNOSIS
A computer-implemented method for performing machine condition monitoring for fault diagnosis includes collecting multivariate time series data from a plurality of sensors in a machine and partitioning the multivariate time series data into a plurality of segment clusters. Each segment cluster corresponds to one of a plurality of class labels related to machine condition monitoring. Next, the segment clusters are clustered into segment cluster prototypes. The segment clusters and the segment cluster prototypes are used to learn a discriminative model that predicts a class label. Then, as new multivariate time series data is collected from the sensors in the machine, the discriminative model may be used to predict a new class label corresponding to segments included in the new multivariate time series data. If the new class label indicates a potential fault in operation of the machine, a notification may be provided to one or more users.
MACHINE LEARNING-POWERED RESOLUTION RESOURCE SERVICE FOR HCI SYSTEMS
A method of processing requests from users of a computer system having software and hardware components, wherein each request includes a description of an software or hardware issue, includes: upon receiving a first request, performing content matching using a first instance of the data model to determine if the issue described in the first request has been previously encountered; executing a machine learning algorithm against a new data set to update a second instance of the data model; after the second instance of the data model has been updated, switching the data model used for performing content matching from the first to the second instance of the data model; and upon receiving a second request after the switch, performing content matching using the second instance of the data model to determine if the issue described in the second request has been previously encountered.