Orchestration of Activities of Entities Operating in a Network Cloud

20230060758 ยท 2023-03-02

    Inventors

    Cpc classification

    International classification

    Abstract

    A method and a communication system configured to operate in a network cloud, are provided. The system comprising a plurality of physical network elements and a server comprising a database storing a plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs); wherein the server is configured to operate as a cloud orchestrator and to receive from the plurality of physical network elements information that relates to key performance indicators (KPIs); wherein upon determining that an action that relates to one or more specific physical network elements from among the plurality of physical network elements needs to be executed, wherein the determination is based on a) the information received from the plurality of physical network elements, and b) one or more threshold stored values and associated with said KPIs, an action selected from among the stored plurality of pre-defined actions is initiated, and required details associated with said selected pre-defined action are retrieved; and wherein the cloud orchestrator is further configured to receive new KPIs from said one or more specific physical network elements and to verify that an improvement has occurred in their operation.

    Claims

    1. A communication system configured to operate in a network cloud, the communication system comprising a plurality of physical network elements each comprising a respective agent and a server comprising a database storing a plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs); wherein said server is configured to operate as a cloud orchestrator and to receive from said plurality of physical network elements information that relates to key performance indicators (KPIs); wherein upon determining that an action that relates to one or more specific physical network elements from among the plurality of physical network elements needs to be executed, wherein said determination is based on a) the information received from the plurality of physical network elements, and b) one or more threshold stored values and associated with said KPIs, an action selected from among said stored plurality of pre-defined actions is initiated, and required details associated with said selected pre-defined action are retrieved; and wherein said cloud orchestrator is further configured to receive new KPIs from said one or more specific physical network elements and to verify that an improvement has occurred in their operation.

    2. The communication system of claim 1, wherein said communication system further comprising a cloud controller configured to communicate with said plurality of the physical network elements' agents over a L2/L3 communication channel and to communicate in a two ways communication with said cloud orchestrator.

    3. The communication system of claim 1, wherein said cloud orchestrator is further configured to trigger a configuration change at at least one of said plurality of physical network elements in response to determining that one or more threshold values have been exceeded.

    4. The communication system of claim 1, wherein said cloud orchestrator is operative to configure a communication channel for conveying messages exchanged between physical network elements that are members of a cluster comprising said plurality of physical network elements, and/or between physical network elements that are members of said cluster and the cloud orchestrator.

    5. The communication system of claim 4, wherein said messages comprise at least one type of messages being a member of a group that consists of: keepalive messages, configuration commands forwarded from said cloud orchestrator to modules installed at physical network elements, messages that are sent every pre-determined time interval to said cloud orchestrator and comprise information that relate to at least one of: current telemetry, statistics, events and KPIs.

    6. The communication system of claim 1, wherein said cloud orchestrator is configured to ensure that a plurality of physical network elements operate as a single virtual routing entity.

    7. The communication system of claim 1, wherein said cloud orchestrator is further configured to forward to said plurality of physical network elements the plurality of pre-defined actions and the plurality of KPI thresholds values for storage at said plurality of physical network elements, wherein the determination that an action needs to be executed is taken by one or more of said specific physical network elements, and wherein the action is selected from among said plurality of pre-defined actions stored at the one or more of said specific physical network elements and the required details associated with said selected action are retrieved from the one or more of said specific physical network elements' storage.

    8. The communication system of claim 1, wherein the determination that an action needs to be executed is taken by said cloud orchestrator, and wherein the action is selected from among said plurality of pre-defined actions stored at the said cloud orchestrator database and the required details associated with said selected action are retrieved from the said cloud orchestrator storage.

    9. The communication system of claim 1, wherein said cloud orchestrator is further configured to analyze traffic flows that are conveyed via said plurality of physical network elements and to identify traffic trends based on said traffic flows.

    10. The communication system of claim 9, wherein said cloud orchestrator is further configured to determine based on the identified traffic trends one or more automatic actions and their associated threshold values, to be carried out in said communication system.

    11. A method for use in a system operating in a network cloud, said system comprising a plurality of physical network elements each comprising a respective agent and a server operative as a cloud orchestrator and comprising a database, wherein the method comprises the steps of: storing a plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs); receiving information that relates to key performance indicators (KPIs) from said plurality of physical network elements; determining whether an action that relates to one or more specific physical network elements from among the plurality of physical network elements needs to be executed, wherein said determination is based on a) the information received from the plurality of physical network elements, and b) one or more threshold stored values; upon determining that said action needs to be executed, retrieving details of the required action from among the plurality of pre-defined stored actions; retrieving details associated with said selected pre-defined action; carrying out the required action by the plurality of physical network elements; and receiving new KPIs from said one or more specific physical network elements and verifying that an improvement has occurred in their operation.

    12. The method of claim 11, further comprising the steps of determining whether one or more threshold values have been exceeded, and if in the affirmative, triggering a configuration change at at least one of said plurality of physical network elements.

    13. The method of claim 11, further comprising a step of establishing a communication channel for conveying messages exchanged between physical network elements that are members of a cluster comprising said plurality of physical network elements, and/or between physical network elements that are members of said cluster and the cloud orchestrator.

    14. The method of claim 13, wherein said messages comprise at least one type of messages being a member of a group that consists of: keepalive messages, configuration commands forwarded from said cloud orchestrator to modules installed at physical network elements, messages that are sent every pre-determined time interval to said cloud orchestrator and comprise information that relate to at least one of: current telemetry, statistics, events and KPIs.

    15. The method of claim 11, further comprising a step of ensuring that a plurality of physical network elements operate as a single virtual routing entity.

    16. The method of claim 11, further comprising a step of monitoring at least one physical network element and KPIs associated therewith at a pre-defined steady rate, and upon detecting that a malfunction is about to be associated with said physical network element, determining a time period during which relevant KPIs will be sampled at higher rate than the steady rate applied prior to making said determination.

    17. The method of claim 11, further comprising the steps of: storing the plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs) at the cloud orchestrator database; forwarding the plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs) to said plurality of physical network elements and storing them thereat; determining by one or more of said specific physical network elements whether said action needs to be executed; upon determining that said action needs to be executed, retrieving details of the required action from among the plurality of pre-defined actions stored at the one or more of said specific physical network elements' storage.

    18. The method of claim 11, further comprising the steps of: storing the plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs) at the cloud orchestrator database; determining by said cloud orchestrator elements whether said action needs to be executed; upon determining that said action needs to be executed, retrieving details of the required action from among said cloud orchestrator database.

    19. A non-transitory computer readable medium storing a computer program for performing a set of instructions to be executed by one or more computer processors, the computer program is adapted to perform a method for use in a network cloud comprising a plurality of physical network elements and a server configured to operate as a cloud orchestrator, wherein said method comprising storing a plurality of pre-defined actions and a plurality of thresholds values associated with key performance indicators (KPIs); receiving information that relates to key performance indicators (KPIs) from said plurality of physical network elements; determining whether an action that relates to one or more specific physical network elements from among the plurality of physical network elements needs to be executed, wherein said determination is based on a) the information received from the plurality of physical network elements, and b) one or more threshold stored values; upon determining that said action needs to be executed, retrieving details of the required action from among the plurality of pre-defined stored actions; retrieving details associated with said selected pre-defined action; carrying out the required action by the plurality of physical network elements; and receiving new KPIs from said one or more specific physical network elements and verifying that an improvement has occurred in their operation.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0086] The accompanying drawings, which are incorporated herein and constitute a part of this specification, illustrate several embodiments of the disclosure and, together with the description, serve to explain the principles of the embodiments disclosed herein.

    [0087] FIG. 1. illustrates a network cloud construed in accordance with an embodiment of the present invention;

    [0088] FIG. 2. demonstrates a schematic block diagram of steps comprised at the stage of configuring a new network element when the latter is added to a network cloud, construed in accordance with an embodiment of the present invention;

    [0089] FIG. 3. depicts a schematic block diagram comprising steps included at the stage of executing a task by a network element in a network cloud, construed in accordance with an embodiment of the present invention; and

    [0090] FIG. 4. illustrates a schematic block diagram of steps comprised at the stage of traffic analysis and automatic execution of actions associated with network elements that belong to a network cloud, construed in accordance with an embodiment of the present invention.

    DESCRIPTION OF EXEMPLARY EMBODIMENTS

    [0091] Some of the specific details and values in the following detailed description refer to certain examples of the disclosure. However, this description is provided only by way of example and is not intended to limit the scope of the invention in any way. As will be appreciated by those skilled in the art, the claimed method and device may be implemented by using other methods and/or other devices that are known in the art per se. In addition, the described embodiments comprise different steps, not all of which are required in all embodiments of the invention. The scope of the invention can be summarized by referring to the appended claims.

    [0092] FIG. 1 illustrates a network cloud (100) construed in accordance with an embodiment of the present invention. The network cloud (100) comprises a cloud orchestrator (110) that includes a storage for KPIs and actions and a database being in two-ways communication with cloud controller 120 (e.g. mediator), comprising a storage for KPIs, actions and a database, and wherein network cloud (100) further comprises a plurality of network elements (NEs) 130.sub.1 to 130.sub.N, each comprising a respective agent 140.sub.1 to 140.sub.N, and wherein these agents are configured to communicate with cloud controller 120 over a L2/L3 communication channel.

    [0093] A cloud orchestrator automates the management, coordination and organization of complicated computer systems, services and middleware. In addition to a reduced requirement for personnel involvement, the orchestration functionality eliminates the potential errors that might be introduced while carrying out provisioning, scaling or other cloud processes.

    [0094] Once an operating system (OS) is installed at the cloud orchestrator 110 (and at the network controller 120, if the latter is deployed), agents 140.sub.1 to 140.sub.N may be installed at NEs 130.sub.1 to 130.sub.N to support communication from cloud orchestrator 110 either directly or through cloud controller 120. Once these agents have been installed, links are established at the L2 layer and respective tunnels may be configured, thereby enabling a two-ways communication between the cloud orchestrator and the network elements.

    [0095] FIGS. 2 to 4 demonstrate various steps included in three different operational stages of a method by which the network cloud referred to hereinabove, operates.

    [0096] FIG. 2 demonstrates a schematic block diagram of steps comprised at the stage of configuring a new network element when added to a network cloud, construed in accordance with an embodiment of the present invention.

    [0097] First, the newly added network element, being for example a router or a switch, is identified and a communication link is established between the cloud orchestrator and that new NE (step 200). The NE is then associated by the managing entity with a certain cluster (step 210) and the cloud orchestrator or the cloud controller, as the case may be, forwards images/dockers to the new NE (step 230). Once a keepalive message is sent by the NE to the cloud orchestrator/controller, checking/confirming that the communication link that has been established between the two is operative, a plurality of KPIs will be collected and stored at the cloud orchestrator, preferably at pre-configurable time intervals (step 240).

    [0098] The next stage is exemplified in FIG. 3, which presents a schematic block diagram of steps included at the stage of task execution by the newly added network element. This example comprises the following steps. First, lists of KPIs are collected from the NEs that communicate (directly or indirectly) with the cloud orchestrator (step 300). The KPIs are compared with pre-defined respective threshold values (step 310) and if a certain KPI reaches its respective threshold value, a pre-defined action will be initiated (step 320) after retrieving the required action details from the database located at the cloud orchestrator (step 330). Preferably, once the cloud orchestrator has executed the required action, it confirms by checking with all NEs that are relevant to the action taken, that indeed the action had an effect on these NEs (step 340). Once the confirmation is obtained, the cloud orchestrator receives new KPIs (at least from these relevant NEs) to verify whether an improvement has occurred in their operation (step 350). After verifying that improvements have been achieved at the NEs, the action is logged at the cloud orchestrator storage (step 360).

    [0099] Another phase of the network cloud operation is exemplified in FIG. 4, which illustrates a schematic block diagram of an embodiment of the present disclosure of carrying out traffic analysis and automatic execution of actions associated with network elements that belong to the network cloud.

    [0100] This phase starts by retrieving KPIs collected from different NEs (step 400). Then, the cloud orchestrator analyzes the traffic flows that are conveyed via these NEs and identifies traffic trends (such as future possible congestion, etc.) based on these traffic flows that were analyzed (step 410). In view of the identified trends, one or more automatic actions and their associated threshold values are suggested to be carried out in the network cloud (step 420), in order to adequately act on the scenarios predicted based on the trends identified in step 410. Once the changes (the new automated actions) are approved (step 430) the new actions and their respective threshold values are added to the cloud orchestrator storage (step 440). Optionally after storing these new actions and their respective threshold values at the cloud orchestrator storage, this information, i.e., the actions and their respective threshold values, are forwarded and stored at at least some of the physical network elements for automatic execution thereof, and upon occurrence of a situation at which the need for a new action arises, the action will be executed automatically (step 450). This embodiment has a number of advantages. First, it enables achieving a faster response time, as it does not require communicating with the cloud orchestrator at the time when an action is actually required to be affected, and second, at times when there is a need to carry out maintenance operations at the cloud orchestrator, real time actions can still be carried out while maintenance operations are carried out at the cloud orchestrator server, independently of the Cloud orchestrator itself, by one or more relevant physical network elements, from among the plurality of the physical network elements comprised in the communication system.

    [0101] When a configuration change to a network element (e.g., a node) or a plurality of nodes is required, the cloud orchestrator (acting as an administrator) may define a configuration patch (e.g. certain configuration lines or scripts) and set a list of one or more threshold values that will trigger that configuration change. The cloud orchestrator triggers the required configuration change when threshold values are exceeded, logs the executed changes and allows rollbacks. The system may execute actions in response to threshold values being exceeded, and machine learning (Artificial Intelligence) actions can be carried for configuration, administration and/or orchestration types of activities. Such actions may be for example one of the following actions: [0102] a. Install and add a NE to a cluster; [0103] b. Drop a NE from a cluster; [0104] c. Shut down interfaces (or the whole NCP) for reducing power consumption [0105] d. Apply a configuration patch to a NE or to a Network Switch; [0106] e. Apply a configuration patch to one or more controllers; [0107] f. Add and remove communication tunnels; [0108] g. Change communication tunnel BW; [0109] h. Reroute to a Distributed Denial of Service (DDoS) scrubber; [0110] i. Apply dynamically rules for an ACL; [0111] j. Establish ACLs dynamically; [0112] k. Determine routes and routing policies on a dynamic basis; [0113] l. Apply and configure VRFs; and [0114] m. Apply quality of service to communications being conveyed via network interfaces.

    [0115] During operation, periodic calculations may be carried out using recently retrieved KPIs in order to identify trends in the network cloud operation. A machine learning algorithm may be used to generate hourly and/or daily and/or weekly trends, which may then be displayed visually to the network operator.

    [0116] Furthermore, based on the calculated trends, predictions can be made and be then translated into relevant threshold values. The threshold values may be saved in a thresholds database which is comprised in this example within the cloud orchestrator, so that an event manager (part of the functionality carried out by the cloud orchestrator server) may trigger events upon exceeding these relevant threshold values.

    [0117] In addition, a list of required or recommended actions may be generated based on the analysis of the collected information and the calculated predictions, and the managing entity of the cloud orchestrator (an administrator) may be used to determine whether a certain action should be executed, or whether to avoid preforming a certain action, or whether to automate a certain action, so that when applicable, that action will be carried out automatically.

    [0118] Moreover, monitoring of failures may be carried out according to the following embodiment: [0119] A) The node (a network element) as well as the generated KPIs associated with that node, are monitored at a pre-defined steady rate. However, once the system detects that a malfunction is about to occur, a detection window is opened, wherein this window is opened for a certain (e.g. pre-defined) time period, during which relevant KPIs are sampled at a rate higher than the steady rate applied before that window was opened, thereby enabling to detect the cause for the possible malfunction. [0120] B) When monitoring the node (network element) as well as the generated KPIs associated with that node at a steady state, the monitoring system may apply a new-wave of streaming telemetry protocols (for example gRPC/gNMI), having a policy based KPI collection mechanism, where priority is assigned to each KPI that allows collecting values of more significant KPIs at higher rate, and such a KPI will receive a better QoS treatment in order to ensure that updates of the significant KPI values are properly received. [0121] C) The monitoring system may also use KPI values collected in order to build a data-set for analyzing normal and abnormal KPI behaviors (in terms of statistical behavior, such as distribution, mean, standard deviation, bias, and the like.)

    [0122] Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.