Patent classifications
H04L41/064
Node health prediction based on failure issues experienced prior to deployment in a cloud computing system
To improve the reliability of nodes that are utilized by a cloud computing provider, information about the entire lifecycle of nodes can be collected and used to predict when nodes are likely to experience failures based at least in part on early lifecycle errors. In one aspect, a plurality of failure issues experienced by a plurality of production nodes in a cloud computing system during a pre-production phase can be identified. A subset of the plurality of failure issues can be selected based at least in part on correlation with service outages for the plurality of production nodes during a production phase. A comparison can be performed between the subset of the plurality of failure issues and a set of failure issues experienced by a pre-production node during the pre-production phase. A risk score for the pre-production node can be calculated based at least in part on the comparison.
System and method of comparing time periods before and after a network temporal event
The present technology pertains to a system, method, and non-transitory computer-readable medium for evaluating the impact of network changes. The technology can detect a temporal event, wherein the temporal event is associated with a change in a network configuration, implementation, or utilization. The technology defines, based on a nature of the temporal event, a first period prior to the temporal event or a second period posterior to the temporal event. The technology compares network data collected in the first period and network data collected in the second period.
Automated incident triage and diagnosis
Techniques for automated incident triage and diagnosis are described. A method of automated incident triage and diagnosis may include receiving incident data associated with an incident, identifying one or more mitigation actions to resolve the incident using at least one machine learning model based at least on the incident data, and automatically executing the one or more mitigation actions to mitigate the incident.
PREDICTIVE ANOMALY DETECTION IN COMMUNICATION SYSTEMS
Systems, methods, and software for operational anomaly detection in communication systems is provided herein. An exemplary method includes obtaining a measured sequence of state information associated with the communications system during a first timeframe, processing the measured sequence of state information to determine a predicted sequence of state information for the communication system during a second timeframe, and monitoring current state information for the communication system over at least a portion of the second timeframe. The method also includes determining operational anomalies associated with the communication system based at least on a comparison between the current state information and the predicted sequence of state information.
NODE HEALTH PREDICTION BASED ON FAILURE ISSUES EXPERIENCED PRIOR TO DEPLOYMENT IN A CLOUD COMPUTING SYSTEM
To improve the reliability of nodes that are utilized by a cloud computing provider, information about the entire lifecycle of nodes can be collected and used to predict when nodes are likely to experience failures based at least in part on early lifecycle errors. In one aspect, a plurality of failure issues experienced by a plurality of production nodes in a cloud computing system during a pre-production phase can be identified. A subset of the plurality of failure issues can be selected based at least in part on correlation with service outages for the plurality of production nodes during a production phase. A comparison can be performed between the subset of the plurality of failure issues and a set of failure issues experienced by a pre-production node during the pre-production phase. A risk score for the pre-production node can be calculated based at least in part on the comparison.
Network system fault resolution via a machine learning model
Disclosed are embodiments for automatically resolving faults in a complex network system. Some embodiments monitor one or more of system operational parameter values and message exchanges between network components. A machine learning model detects a fault in the complex network system, and an action is selected based on a cause of the fault. After the action is applied to the complex network system, additional monitoring is performed to either determine the fault has been resolved or additional actions are to be applied to further resolve the fault.
Wireless access network element status reporting
A wireless communication network manages a wireless access node. The wireless access node wirelessly exchanges user data with wireless User Equipment (UEs) and exchanges the user data with one or more network elements. The wireless access node generates status indicators that characterize wireless access node operation during the user data exchanges. An Element Management System (EMS) determines EMS load based on EMS operation and transfers load data that indicates the EMS load for delivery to the wireless access node. The wireless access node receives the load data transferred by the EMS. The wireless access node identifies individual priorities for individual ones of the status indicators. The wireless access node determines individual reporting times for the individual ones of the status indicators based on the load data and the individual priorities. The wireless access node transfers the individual ones of the status indicators to the EMS per the individual reporting times.
APPLICATION SESSION-SPECIFIC NETWORK TOPOLOGY GENERATION FOR TROUBLESHOOTING THE APPLICATION SESSION
A network management system (NMS) is described that provides a granular troubleshooting workflow at an application session level using an application session-specific topology from a client device to a cloud-based application server. During an application session of a cloud-based application, a client device running the application exchanges data through one or more access point (AP) devices, one or more switches at a wired network edge, and one or more network nodes, e.g., switches, routers, and/or gateway devices, to reach a cloud-based application server. For a particular application session, the NMS generates a topology based on network data received from a subset of network devices, e.g., client devices, AP devices, switches, routers, and/or gateways, that were involved in the particular application session over a duration of the particular application session. In this way, the NMS enables backward-looking troubleshooting of the particular application session.
APPLICATION SERVICE LEVEL EXPECTATION HEALTH AND PERFORMANCE
Techniques are described for monitoring application performance in a computer network. For example, a network management system (NMS) includes a memory storing path data received from a plurality of network devices, the path data reported by each network device of the plurality of network devices for one or more logical paths of a physical interface from the given network device over a wide area network (WAN). Additionally, the NMS may include processing circuitry in communication with the memory and configured to: determine, based on the path data, one or more application health assessments for one or more applications, wherein the one or more application health assessments are associated with one or more application time periods for a site, and in response to determining at least one failure state, output a notification including identification of a root cause of the at least one failure state.
MONITORING CAUSATION ASSOCIATED WITH NETWORK CONNECTIVITY ISSUES
Described herein are systems, methods, and software to identify causes of connectivity issues in a computing environment. In one example, a computing system monitors network characteristics associated with the computing system and identifies an error notification from a service on the computing system that indicates a connectivity issue with one or more other computing systems. In response to the error notification, the computing system identifies additional network characteristics associated with connections to the one or more other computing system and determines one or more probable causes of the connectivity issue based on the network characteristics and additional network characteristics. The computing system can then generate a summary using the one or more probable causes.