H04L41/0645

Node health prediction based on failure issues experienced prior to deployment in a cloud computing system

To improve the reliability of nodes that are utilized by a cloud computing provider, information about the entire lifecycle of nodes can be collected and used to predict when nodes are likely to experience failures based at least in part on early lifecycle errors. In one aspect, a plurality of failure issues experienced by a plurality of production nodes in a cloud computing system during a pre-production phase can be identified. A subset of the plurality of failure issues can be selected based at least in part on correlation with service outages for the plurality of production nodes during a production phase. A comparison can be performed between the subset of the plurality of failure issues and a set of failure issues experienced by a pre-production node during the pre-production phase. A risk score for the pre-production node can be calculated based at least in part on the comparison.

AUTOMATICALLY DETECTING AN ERROR IN A COMMUNICATION AND AUTOMATICALLY DETERMINING A SOURCE OF THE ERROR

A device may monitor a communication between network devices for an error associated with the communication. The device may detect the error associated with the communication between the network devices. The device may perform a comparison of an error metric and a threshold error metric. The error metric may be associated with the error. The device may determine whether the comparison indicates that the error metric satisfies the threshold error metric. The device may identify a source of the error using a loopback test based on determining whether the comparison indicates that the error metric satisfies the threshold error metric. The device may provide error source information based on identifying the source of the error. The error source information may identify the source of the error.

NODE HEALTH PREDICTION BASED ON FAILURE ISSUES EXPERIENCED PRIOR TO DEPLOYMENT IN A CLOUD COMPUTING SYSTEM

To improve the reliability of nodes that are utilized by a cloud computing provider, information about the entire lifecycle of nodes can be collected and used to predict when nodes are likely to experience failures based at least in part on early lifecycle errors. In one aspect, a plurality of failure issues experienced by a plurality of production nodes in a cloud computing system during a pre-production phase can be identified. A subset of the plurality of failure issues can be selected based at least in part on correlation with service outages for the plurality of production nodes during a production phase. A comparison can be performed between the subset of the plurality of failure issues and a set of failure issues experienced by a pre-production node during the pre-production phase. A risk score for the pre-production node can be calculated based at least in part on the comparison.

ROOT CAUSE DETECTION OF ANOMALOUS BEHAVIOR USING NETWORK RELATIONSHIPS AND EVENT CORRELATION
20230016199 · 2023-01-19 ·

This disclosure describes systems, devices, and techniques for determining a root cause of anomalous events in a networked computing environment. A node detects an alert corresponding to an anomalous event during a time period. The alert is correlated with previously detected alerts occurring within the time period and a causal relationship associated with nodes in the networked computing environment. The node may then recursively identify a root cause of the anomalous event detected in the networked computing environment based on a set of correlated alerts. An incident ticket may then be sent to the node identified as the root cause of the anomalous event, and the node may notify other nodes in the network having a causal relationship with the node of the anomalous event.

Intelligent system for network and device performance improvement

Methods, systems, and computer-readable media are disclosed herein that monitor and improve network performance and reliability of a plurality of devices and nodes. In aspects, alert types are categorized based on the role, model, and operating system of a device or node within the network for which the alert was generated. A command set that is responsive to the alert and that is specially configured for the role, model, and operating system of the device or node is automatically selected to address the alert. The command set can be executed against the device or node (or neighboring device/node) in order to investigate the cause or source of the alert. Based on the results returned by the command set's execution, remediation actions can be selected and implemented to improve the technological performance (e.g., memory, CPU, connectivity) of the device or node in the network.

MONITORING USER EXPERIENCE USING DATA BLOCKS FOR SECURE DATA ACCESS

Techniques for enabling secure access to data using data blocks is described. Computing device(s) can provide instruction(s) to a component associated with an entity, wherein the instruction(s) are associated with an identifier corresponding to a data block of a plurality of data blocks. The computing device(s) can receive, from the component, data associated with the component, wherein the data is associated with the identifier and is indicative of a state of the component. The computing device(s) can store the data in the data block and monitor, using rule(s), changes to the state of the component based at least partly on the data in the data block. As a result, techniques described herein enable near real-time—and in some examples, automatic—reporting and/or remediation for correcting changes to the state of the component using data that is securely accessed by use of data blocks.

Intention-based device component tracking system

A component tracking system includes a plurality of devices that each include a device component monitoring subsystem that monitors for and records the addition and/or removal of device components to/from that device. A device component manager system that is coupled to each of the plurality of devices identifies, via a recording by a first device monitoring subsystem in a first device included in the plurality of devices, the removal of a first device component from the first device. The device component manager system then determines, via a recording by a second device component monitoring subsystem in a second device included in the plurality of devices, that the first device component has been added to the second device within a time period and, in response, the device component manager device prevents a removed first device component alarm.

Method for supporting service level agreement monitoring in a software defined network and corresponding software defined network

A method supports service level agreement monitoring in a software defined network. The software defined network has forwarding elements and a software defined network controller for controlling the forwarding elements. Data flows are transmitted between a first end-path forwarding element, of the forwarding elements, and a second end-path forwarding element, of the forwarding elements, via at least one intermediate forwarding element, of the forwarding elements. The software defined network controller configures the intermediate forwarding element such that a probe triggering packet is generated based on local information of the intermediate forwarding element. The software defined network controller configures at least one of the first end-path forwarding element or the second end-path forwarding element such that an end-to-end probing is triggered based on receiving the probe triggering packet. The end-to-end probing is performed in order to detect a service level agreement violation.

Virtualized network service management and diagnostics

A device monitors, for a software-defined networking wide area network (SD-WAN) deployment, a set of virtualized network services of the SD-WAN deployment, and applies a set of diagnostic tests to evaluate the set of virtualized network services. The device detects, based on monitoring the set of virtualized network services and in connection with applying the set of diagnostic tests, an event associated with a virtualized network service. The device analyzes, using an analytics model of SD-WAN operation, the event to identify an issue associated with the virtualized network service, and determines, based on the analytics model of SD-WAN operation, a recommendation relating to remediating the issue. The device generates an abstraction layer user interface to represent the set of virtualized network services and to convey the recommendation relating to remediating the issue, and implements, after providing the abstraction layer user interface, the recommendation to remediate the issue.

System and method for root cause analysis of call failures in a communication network
11695619 · 2023-07-04 · ·

The claimed system and method describes a root cause analysis system for a radio access network. Some aspects include automatic identification of possible causes for network issues, their ranking, determination of the root (main) cause and execution of related best actions, alerts and reporting in order to automatically identify, mitigate or eliminate the problem.