H04L41/0636

SERVICE ISSUE PRIORITISATION BASED ON IMPACT USING SOFTWARE TELEMETRY
20220116263 · 2022-04-14 ·

A system is provided herein that can correlate service issues with system telemetry associated with the software session associated with those service issues. Using a statistical approach, the system can evaluate data across numerous software sessions to rank the importance of the reported service issues. To accomplish the ranking, the system can parse the reports of service issues on a periodic basis, can extract telemetry identifiers (IDs) from the logs, can query the telemetry, may compute the relative importance of detected issues (in the context of calls going on for that day), and then can report this impact hack to the service issue database.

Alarm Analysis Method and Related Device
20220086036 · 2022-03-17 ·

An alarm analysis method, including determining M alarm pairs in a first alarm set, where each alarm pair of the M alarm pairs includes a first alarm and a second alarm having an association, generating, according to an association rule, a first feature set of N alarm pairs, the first alarm of each alarm pair of the N alarm pairs being an alarm pair root in the first feature set, the first feature set including a first probability that a first subsystem to which each first alarm belongs is a subsystem root and a first alarm object is an alarm object root and a second probability that a second subsystem to which each second alarm belongs is a subsystem root and a second alarm object is an alarm object root, and determining root information of the first alarm set based on the first probability and the second probability.

HYPOTHESIS DRIVEN DIAGNOSIS OF NETWORK SYSTEMS
20220103417 · 2022-03-31 ·

An example method includes obtaining, by one or more processors, data indicating resource dependencies between a plurality of resources in a network and event dependencies between a plurality of network events and one or more of the plurality of resources; generating a Bayesian model based on resource types of the plurality of resources and event types of the plurality of network events; receiving an indication of a fault in the network; collecting fault data and generating, based on the Bayesian model and the fault data, a plurality of root cause hypotheses for the fault; ordering the plurality of root cause hypotheses based on respective root cause probabilities associated with the plurality of root cause hypotheses; and outputting the ordered plurality of root cause hypotheses.

Handling issues reported by network devices

Examples described herein relate to method and an issue management system for handling issues reported from network devices. The issue management system may receive an issue from a network device of a plurality of network devices arranged in one or more computing environments. The issue management system may determine whether there exists a solution corresponding to the issue in a solution repository based on the issue and a knowledge base. Further, in response to determining that there exists the solution corresponding to the issue, the issue management system may communicate the solution to a computing environment of the one or more computing environments that hosts the network device reporting the issue. Alternatively, if there exists no solution corresponding to the issue, the issue management system may relay the issue to a management station for the management station to address the issue.

NETWORK ISSUE TRACKING AND RESOLUTION SYSTEM

In one embodiment, an issue analysis service obtains telemetry data from a plurality of devices in a network across a plurality of time intervals. The service detects a failure event in which a device in the network is in a failure state. The service clusters the telemetry data obtained prior to the failure event into rounds according to time intervals in which the telemetry data was collected. Each round corresponds to a particular time interval. The service applies a machine learning-based classifier to each one of the rounds of clustered telemetry data to identify one or more common traits appearing in the telemetry data for each round. The service generates a trait change report indicating a change in the one or more common traits appearing in the telemetry data across the rounds leading up to the failure event.

Failure impact analysis of network events

Failure impact analysis (or “impact analysis”) is a process that involves identifying effects of a network event that are may or will results from the network event. In one example, this disclosure describes a method that includes generating, by a control system managing a resource group, a resource graph that models resource and event dependencies between a plurality of resources within the resource group; detecting, by the control system, a first event affecting a first resource of the plurality of resources, wherein the first event is a network event; and identifying, by the control system and based on the dependencies modeled by the resource graph, a second resource that is expected to be affected by the first event.

Cell accessibility prediction and actuation

A method for predicting cell accessibility issues for a mobile network. The method includes receiving a set of metrics from the mobile network, processing a set of key performance indicators (KPIs) derived from the set of metrics in an ensemble machine learning model, the ensemble machine learning model including an RRC model, an RACH model, an ERAB model, and an S1 signaling model to generate at least one cell accessibility degradation prediction and a confidence score, and applying a root cause mapping to the at least one cell accessibility degradation prediction and the confidence score to identify at least one recommended action to correct a correlated cell accessibility issue.

System and method for anomaly detection with root cause identification

A computer device may include a processor configured to obtain key performance indicator (KPI) values for KPI parameters associated with at least one device and compute a set of historical statistical values for the obtained KPI values associated with the network device. The processor may be further configured to provide the KPI values and the computed set of historical statistical values to an anomaly detection model to identify potential anomalies; filter the identified potential anomalies based on a designated desirable behavior for a particular KPI parameter to identify at least one anomaly; and send an alert that includes information identifying the at least one anomaly to a management system or a repair system associated with the device. The computer device may further determine a root cause KPI parameter for the identified at least one anomaly and include information identifying the determined root cause KPI parameter in the alert.

Simulating multiple paths of a course of action executed in an information technology environment

Described herein are improvements for generating courses of action for an information technology (IT) environment. In one example, a method includes identifying a first course of action for responding to an incident type in an information technology environment and generating a simulated incident associated with the incident type. The method further includes initiating performance of the first course of action based on the generation of the simulated incident. The method also includes, upon reaching a particular step of the first course of action that prevents the performance of the first course of action from proceeding, providing a first simulated result that allows the performance of the first course of action to proceed.

Computer system alert situation detection based on trend analysis

A computer system to detect data alerts includes a data preprocessing system and a data analysis system. The data preprocessing system obtains system data of the computer system, generates predicted data based on the system data, and generates differential data indicating a difference between the system data and the predicted data. The data analysis is in signal communication with the data preprocessing system and determines that the differential data is one of expected event or an anomaly event in response to adding at least one new value from the differential data to the predicted data.