H04L41/0631

Monitoring and self-healing of deployed environments

In various examples, a system identifies a first issue object associated with the alert by making a first set of determinations, based on an alert of an active issue of a system resource. Additionally, the system can determine whether the active issue associated with the first issue object can be automatically corrected by one or more self-healing processes, based on the first issue object. Moreover, the system can implement the one or more self-healing processes, based on determining that the active issue associated with the first issue object can be automatically corrected by one or more self-healing processes.

Monitoring and self-healing of deployed environments

In various examples, a system identifies a first issue object associated with the alert by making a first set of determinations, based on an alert of an active issue of a system resource. Additionally, the system can determine whether the active issue associated with the first issue object can be automatically corrected by one or more self-healing processes, based on the first issue object. Moreover, the system can implement the one or more self-healing processes, based on determining that the active issue associated with the first issue object can be automatically corrected by one or more self-healing processes.

Predictive routing using machine learning in SD-WANs

In one embodiment, a supervisory service for a software-defined wide area network (SD-WAN) obtains telemetry data from one or more edge devices in the SD-WAN. The service trains, using the telemetry data as training data, a machine learning-based model to predict tunnel failures in the SD-WAN. The service receives feedback from the one or more edge devices regarding failure predictions made by the trained machine learning-based model. The service retrains the machine learning-based model, based on the received feedback.

Unified recommendation engine
11711287 · 2023-07-25 · ·

A system receives, from one or more subsystems, one or more predicted outcomes associated with a device. The system provides provide at least a subset of the predicted outcomes as input to a machine learning model trained to identify a set of resolution actions. The system receives, from the machine learning model, the set of resolution actions for the subset of the predicted outcomes, wherein each resolution action in the set of resolution actions is associated with a probability of resolving at least one of the predicted outcomes in the subset of predicted outcomes. The system identifies a first resolution action from the set of resolution actions, wherein the first resolution action has a highest probability of resolving the at least one of the predicted outcomes in the subset of predicted outcomes. The system provides a first instruction to execute the first resolution action.

DISTRIBUTED FAULT CODE AGGREGATION ACROSS APPLICATION CENTRIC DIMENSIONS

Systems, methods, and computer-readable media for fault code aggregation across application-centric dimensions. In an example embodiment, a system obtains respective fault codes corresponding to one or more network devices in a network and maps the one or more network devices and/or the respective fault codes to respective logical policy entities defined in a logical policy model of the network, to yield fault code mappings. The system aggregates the one or more of the fault code mappings along respective logical policy dimensions in the network to yield an aggregation of fault codes across respective logical policy dimensions and, based on the aggregation, presents, for each of the respective logical policy dimensions, one or more hardware-level errors along the respective logical policy dimension.

DISTRIBUTED FAULT CODE AGGREGATION ACROSS APPLICATION CENTRIC DIMENSIONS

Systems, methods, and computer-readable media for fault code aggregation across application-centric dimensions. In an example embodiment, a system obtains respective fault codes corresponding to one or more network devices in a network and maps the one or more network devices and/or the respective fault codes to respective logical policy entities defined in a logical policy model of the network, to yield fault code mappings. The system aggregates the one or more of the fault code mappings along respective logical policy dimensions in the network to yield an aggregation of fault codes across respective logical policy dimensions and, based on the aggregation, presents, for each of the respective logical policy dimensions, one or more hardware-level errors along the respective logical policy dimension.

NODE HEALTH PREDICTION BASED ON FAILURE ISSUES EXPERIENCED PRIOR TO DEPLOYMENT IN A CLOUD COMPUTING SYSTEM

To improve the reliability of nodes that are utilized by a cloud computing provider, information about the entire lifecycle of nodes can be collected and used to predict when nodes are likely to experience failures based at least in part on early lifecycle errors. In one aspect, a plurality of failure issues experienced by a plurality of production nodes in a cloud computing system during a pre-production phase can be identified. A subset of the plurality of failure issues can be selected based at least in part on correlation with service outages for the plurality of production nodes during a production phase. A comparison can be performed between the subset of the plurality of failure issues and a set of failure issues experienced by a pre-production node during the pre-production phase. A risk score for the pre-production node can be calculated based at least in part on the comparison.

Method and system to identify a source of signal impairment

A method and system for managing performance of over a multimedia content distribution network (MCDN), such as a digital subscriber line network, involves receiving an indication of an impairment in network performance from an MCDN client. The MCDN node associated with the client may be identified and a community of MCDN clients coupled to the MCDN node may be further identified. Impairment information, representative of MCDN equipment, may be collected for each of the MCDN clients. Detailed network diagnostics and field service may be performed for MCDN clients based on a characterization of the impairment parameters. After remediation of the MCDN node, collection of the impairment information may be terminated.

Network system fault resolution via a machine learning model

Disclosed are embodiments for automatically resolving faults in a complex network system. Some embodiments monitor one or more of system operational parameter values and message exchanges between network components. A machine learning model detects a fault in the complex network system, and an action is selected based on a cause of the fault. After the action is applied to the complex network system, additional monitoring is performed to either determine the fault has been resolved or additional actions are to be applied to further resolve the fault.

Machine learning device, information processing device and output device
11715013 · 2023-08-01 · ·

A machine learning device includes at least one processor; and at least one memory device configured to store a program, the program executed by the at least one processor to cause the at least one processor to obtain at least one first information from a communication relay device, the first information changing due to communication of the communication relay device; and to correlate the obtained at least first information with at least one characteristic of a replaceable candidate device to perform machine learning.