Communication Resilience in a Network
20220311694 · 2022-09-29
Inventors
- Richard Lee Goodson (Huntsville, AL, US)
- Darrin L. Gieger (Huntsville, AL, US)
- Andrew T. Ruble (Athens, AL, US)
- Brent Priddy (Huntsville, AL, US)
Cpc classification
H04L12/4641
ELECTRICITY
International classification
H04L45/00
ELECTRICITY
Abstract
Methods and systems for resilient network communication are provided. In one aspect, a network includes multiple edge network elements, core network elements, and off-network network elements. Each network element has multiple ports. Communication paths exist between edge network elements, traversing core network elements. A maintenance domain maintains communication resiliency in the network through maintenance domain entities that detect network communication faults. Maintenance domain entities are associated with ports of edge network elements. VLAN service provision to subscribers occurs over the network by mapping services to VLAN tags such that the service VLAN includes information about the resilient network. VLAN service assignment to maintenance domains is balanced.
Claims
1. A method of communication resilience in a network, comprising: establishing a working communication path between a first network element and a second network element, wherein the working communication path communicatively couples to a MEP of a first network element and a MEP of the second network element, wherein the working communication path traverses a spine network element, and wherein the working communication path is an active path; establishing a protection communication path between the first network element and the second network element, wherein the protection communication path communicatively couples to a second MEP of the first network element and a second MEP of the second network element, wherein the protection communication path traverses a second spine network element, and wherein the protection communication path is a standby path; detecting a network fault on the working communication path based on non-responsiveness of the MEP of the first network element or the MEP of the second network element; and responding to the network fault on the working communication path by promoting, at the first network element, the protection communication path, wherein the protection communication path becomes the active path and forwards received downstream network traffic.
2. The method of claim 1 wherein the working communication path and the protection communication path comprise a protection group.
3. The method of claim 1 wherein the working communication path has a MEG associated with the MEP of the first network element and the MEP of the second network element and the protection communication path has a second MEG associated with the second MEP of the first network element and the second MEP of the second network element.
4. The method of claim 1 wherein a MEG is associated with an OAM VLAN.
5. The method of claim 2 wherein the protection group is associated with a multicast service VLAN.
6. The method of claim 2 wherein the protection group is associated with a unicast service VLAN
7. The method of claim 1 wherein the second network element operates as a proxy for the multicast service VLAN and a core network element operates as a snoop function for the multicast service VLAN.
8. A method for communication resilience comprising: assigning a unicast service to a protection group, wherein the protection group comprises an active communication path and a standby communication path between a service provider network element and a subscriber network element; associating with a unicast service VLAN packet a tag, wherein the tag identifies the unicast service VLAN packet as a member of the unicast service VLAN; encoding the tag with information for switching traffic from a service provider to a subscriber, wherein the tag uniquely identifies the subscriber, the service provider, the subscriber network element, the service provider network element, a port of the subscriber network element, and a maintenance domain; detecting a network failure along a path between the service provider and the subscriber; in response to detecting the network failure, switching to a protection path of the maintenance domain using the information encoded in the tag.
9. The method of claim 8, wherein the tag comprises an outer tag and an inner tag.
10. The method of claim 8, wherein encoding the tag further comprises: encoding the outer tag with information for switching traffic across a core network element between a subscriber network element and a service provider network element; and encoding the inner tag with information for switching traffic between the subscriber network element and the subscriber.
11. The method of claim 8, wherein assigning a unicast service to a protection group includes assigning a class and a weight to the unicast service.
12. The method of claim 8, wherein assigning a unicast service to a protection group further comprises balancing weight and class of unicast services at ports of the service provider network element.
13. The method of claim 12, wherein balancing comprises a round robin algorithm.
14. The method of claim 12, wherein balancing comprises a minimum PIR algorithm.
15. The method of claim 12, wherein balancing comprises a minimum CIR algorithm.
16. The method of claim 12, wherein assigning a unicast service to a protection group further comprises balancing the weight and class of services at the subscriber network elements ports.
17. A network comprising: a west network element; a spine network element; an east network element; a working communication path established between the west network element and the east network element, wherein the working communication path communicatively couples to a first MEP of the west network element and a first MEP of the east network element and wherein the working communication path is an active path; a protection communication path established between the west network element and the east network element, wherein the protection communication path communicatively couples to a second MEP of the west network element and a second MEP of the east network element and wherein the protection communication path is a standby path; wherein the west network element is configured to forward upstream network traffic to the active path and the standby path; wherein the west network element is configured to forward downstream network traffic received on the active path and drop downstream network traffic received on the standby path; wherein non-responsiveness of the first MEP of the west network element or the first MEP of the east network element indicates a network fault on the active path; and wherein, in response to the network fault on the active path, the east network element is configured to switch the protection communication path to the active path and forward received downstream network traffic.
18. The network of claim 17 wherein the first and second MEPs of the east and west network elements are configured to generate continuity check messages, wherein the continuity check messages include status information about a local port and a physical interface.
19. The network of claim 18 wherein: the working communication path has a MEG associated with the first MEP of the west network element and the first MEP of the east network element; and the protection communication path has a second MEG associated with the second MEP of the west network element and the second MEP of the east network element.
20. The network of claim 17 wherein the working communication path and the protection communication path comprise a protection group.
21. The network of claim 17 wherein a MEG is associated with an OAM VLAN.
22. The network of claim 20 wherein the protection group is associated with a multicast service VLAN.
23. The network of claim 20 wherein the protection group is associated with a unicast service VLAN.
24. The network of claim 17 wherein the east network element comprises a proxy for a multicast service VLAN.
25. The network of claim 17 wherein the spine element comprises a snoop function for a multicast service VLAN.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0018]
[0019]
[0020]
[0021]
[0022]
[0023] Like reference numbers and designations in the various drawings indicate like elements.
DETAILED DESCRIPTION
[0024] Methods and systems for communication resilience are discussed throughout this document. As will be discussed in more detail with reference to the figures, multiple communication paths are established to create redundant links. Services are assigned to the redundant links in an optimized manner. The multiple communication paths are monitored for network faults which result in state changes within the redundant links whereby protection paths become active in order to maintain communications in the network.
[0025] For example, multiple pairs of communication paths are established between network elements and each service is assigned to a pair of communication paths (e.g., a transport entity (TE)). The pair of paths includes a working path and a protection path. The pair of paths has an associated state such that one of the paths is an active path and the other is a standby path. Typically in a non-fault state for unicast services, the working path is the active path and the protection path is the standby path. Typically for multicast services, when a fault occurs on the active path, the group state changes such that the standby become the active and vice versa. For multicast, this state continues until a fault is detected on the active path.
[0026] Network elements may include routers, switches, OLTs, spines, leafs, gateways, and the like. An OLT typically connects a passive optical network to aggregated uplinks and transmits shared downstream data in over the passive optical network to users.
[0027] The disclosure herein may be used in diverse network topologies as will be appreciated by one of skill in the art. One such topology is a spine-leaf network.
[0028] In a spine leaf network, every lower-tier switch (leaf) is connected to each of the top-tier switches (spine) in a full-mesh topology. The leaf layer consists of access switches that connect to subscribers and communications providers. The spine layer is the backbone of the network and is responsible for interconnecting all leaf switches. Every leaf switch connects to every spine switch in the fabric. The leaf switches may be a heterogeneous collection of network elements.
[0029] With respect to OAM (operations, administration and maintenance) network configurations, among many possible configurations, the availability of communication paths can be monitored using Maintenance Entity Groups (MEGs) and Maintenance End Points (MEPs). A MEG is a logical domain within an ethernet network. The MEG consists of network entities that belong to the same service inside a common OAM domain. A MEG may be associated to a specific VLAN, with several MEGs able to use the same VLAN value. For multicast services, VLANs enable more efficient distribution of IPTV multicast streams. A MEP defines an edge of an ethernet OAM domain. Network elements, such as West NEs and East NES, have a MEP associated with each interface. A MEG is associated with each spine. This association of MEPs and MEGs minimizes the number of MEPs and reduces continuity check message (CCM) processing load.
[0030] Associated with each unicast service VLAN packet there is a tag or tags which identifies the packet as being a member of the service VLAN. The format for these tags is defined in IEEE 802.1ad and later incorporated into 802.1Q. Typically there is both an outer tag (sometimes called the S-tag) and an inner tag (sometimes called the C-tag).
[0031] Services provisioned for the network are assigned to a pair of communication paths. Services to be provisioned are classified with a service type. Service type may include whether the service is a multicast service or unicast service. With respect to multicast services, each multicast service type is provisioned with a weight and class. The weight of a multicast service may include processing requirements, quality of service requirements, bandwidth requirements, and the like. The class of a multicast service may include standard definition video, high definition video, video conferencing, standard definition and high definition streaming audio, and the like. As one of skill in the art can appreciate, classes of multicast services may be differentiated by quality of service requirements or other factors. With respect to unicast services, each unicast service type is provisioned.
[0032] Service assignment is optimized in order to balance load on network elements, their ports, or communication paths, as one example. When a new multicast service is added to the system, the service may be assigned to a path pair such that the sum of the weights of all multicast services of the same class is balanced between the available pairs of paths. For example, setting the class and weight to one for all service types results in a round robin assignment of multicast services to the available pairs of paths.
[0033] For network elements such as an East NE, since upstream IGMP is forwarded to both the active and standby paths, both paths will have the same set of multicast services and the East NE port weight and class will be balanced. Therefore, in this situation, balancing of multicast class and weight need to only be done for the ports of East network elements.
[0034] Multicast services may operate according to established protocols. One protocol used for multicast management is IGMP (Internet Group Management Protocol). IGMP is used by hosts and adjacent routers on IP networks to establish multicast group memberships. IGMP allows the network to direct multicast transmissions only to hosts that have requested them. IGMP can be used for one-to-many networking applications such as online streaming video and gaming, and allows more efficient use of resources when supporting these types of applications.
[0035]
[0036] With respect to
[0037] An ELPS protection group is established 330 to protect that communication link and communications proceed on the link protected by the ELPS protection group 340. Network traffic received at the network element is processed 370, including determining whether the network traffic received at the network element is upstream or downstream 380. If the network traffic is upstream 385, then the network element forwards that network traffic to the active TE and standby TE 388. If the network traffic is downstream 387, then the network element forwards the network traffic on the active TE and drops the network traffic on the standby TE 389. As network traffic is being received and processed 370, the network element also monitors CCM traffic 362. Using the CCM traffic, the network element can detect a network fault 363. In some implementations, the network fault is detected based on non-responsiveness of the MEP of the first network element or the MEP of the second network element. The network fault can be detected, for example, using continuity check messages generated by the MEP of either the first network element or the second network element. For example, if three continuity check messages in a row are not received, that can indicate that the there is a network fault in the communication path. As another example, continuity check message can be generated to include status information about a local port and/or physical interface, and this continuity check message can be examined to determine the status of a network element. In other implementations, the network fault can be detected based on a physical fault in a connection to the network element.
[0038] If a network fault is detected 364, the standby TE is promoted to active and the active TE is made the standby TE 368, resulting in the formerly standby communication path becoming the active path and carries or forwards received downstream network traffic. While no network fault is detected 365, communication proceeds with the active TE and the the standby TE 340.
[0039] As one example, where each pair of paths, the working path and protection path, is part of a multicast tree in a network with two spine elements, upstream and downstream traffic is handled by a network element. For instance, upstream traffic received at an East NE will be forwarded from an East NE proxy function to both spines over both paths in the pair of paths. Downstream traffic received by the East NE proxy function will be forwarded from the active path and downstream traffic received on the standby path will be dropped. In this example, the West NE will operate proxy functions for each VLAN and the spine element will operate as a snoop function for each VLAN.
[0040] As a further example, pairs of paths over which upstream and downstream traffic flows can be maintained using 1+1 ELPS as described in ITU-T G.8031. The effect of this is that during normal operation upstream IGMP and multicast traffic will be duplicated on the working and protect paths. Also, the multicast tables of the spine elements and the West NEs will be synchronized. Synchronization may occur through IGMP proxy and snoop functions. An IGMP snoop function at a spine element listens to IGMP upstream packets and, based on changes in services, it may update IGMP state information at the spine element. This may be referred to as transparent snooping because there is no modification of the upstream packets. The West NE may provide a proxy function whereby if it is already serving certain network traffic to a network node it will not request content from an upstream server when it receives an additional request for that same content from another network node. Instead, the proxy function at the West NE will update its IGMP state table and serve that network traffic stream to the additional node. This may require the West NE proxy to modify downstream traffic.
[0041] In a spine and leaf network, the pairs of communication paths may traverse network spines. As one of skill in the art can appreciate, the disclosures herein can be extended to networks including more than two spines. Networks with more than two spines increase the number of pairs of paths between network elements. The spine and leaf topology may be dense, where a path exists from each leaf to each spine, but at a minimum each leaf must connect to two spine elements. To be scalable, the service assignment algorithm must balance class and weight between the multiple pairs of paths when making service assignments.
[0042]
[0043] With respect to
[0044] In some implementations, the assignment of the first partial communication path is determined based on the weight assigned to the service, the service class for the service, and/or one or more existing services carried by candidate partial communication paths between the first network element and the intermediate network element. Candidate partial communication paths are partial communications paths between two network elements that are available to have the new service assigned.
[0045] In some implementations, the assignment of the second partial communication is determined based on the weight assigned to the service, the service class for the service, and one or more existing services carried by candidate partial working communication paths between the intermediate network element and the second network element.
[0046] In some situations, the assignment of the communication path can include balancing services provided over the candidate partial communication paths, as discussed throughout this document. Once the partial paths of the communication path are assigned, the service is provisioned over the communication path 470.
[0047] Between a West NE and an East NE across a spine, a given service on a VLAN traverses one of two TEs: a working TE or a protection TE. A given TE has two states: active or standby. These two TEs and their associated services, running on VLANs, form an ELPS (Ethernet linear protection switching) group. In normal operation, the unicast service will traverse the working TE. However, in a fault state, the unicast service will traverse the protection TE. It may revert to the working TE when the failure has been corrected. This is known as 1:1 bidirectional revertive ELPS (G.8031). A down MEP is defined on the interface associated with each end of each TE. CCMs (continuity check messages) are used to determine connectivity and trigger the protection switch and reversion. APS coordinates the switching at the two ends and traverses the protection TE. While the OAM and APS traverse an OAM VLAN, the service VLANs are independent of the OAM VLAN.
[0048] With two spines, there are four possible transport entities (TEs) between any West NE and any East NE. At a given point in time, any of these TEs can be both the working TE for some services and the protection TE for other services, so that traffic will normally flow on all of these TEs. Every working TE is paired with a protection TE such that for every West NE/East NE combination there are four possible unicast ELPS groups. These groups and TEs must be established before any services can be assigned. The continuity of the transport entities is monitored using CCM from MEPs place on the West NE and East NE physical interfaces. Each West NE has a MEP on each physical interface and each East NE has two MEPs on each physical interface. Each ELPS group is assigned a single S-VID and multiple C-VIDs.
[0049]
[0050] When a new unicast service is added to the system, the service is assigned to one of the ELPS groups which connects the West NE to the subscriber's East NE. This assignment is done by optimizing the balance of weight and class of services at the West NE ports while considering the weight and class of services at the East NE ports. Note that for a given TE, the link between West NE and spine has may have a different mix of services compared to the link between spine and East NE. Each East NE link has traffic to and from all West NEs and each West NE link has traffic to and from all East NEs. Consequently, the balancing calculations must be done independently between West NE and spine versus spine and East NE.
[0051] Each unicast service type will be assigned a class and a weight. When adding a unicast service of given class and weight, two criteria are used jointly: East NE port balance and West NE port balance. East NE port balance may be computed, for each East NE port, through the sum of all services of the same class. Assuming two sums, S1 and S2, corresponding to the East NE ports 1 and 2, if abs(S1−S2) is greater than some threshold (X), then eliminate from consideration the two ELPS groups with working TEs associated with the East NE port with the larger sum. One threshold may be X=5% of the maximum number of subscribers on the East NE. West NE port balance may be computed, for each West NE port, through the sum of weights of all services of the same class. Then, considering the set of those ELPS groups that meet the East NE port balancing criteria, select the ELPS group with the working TE associated with the West NE port with the minimum sum of weights of the same class. The balancing algorithm may include multicast CIR in the East NE and West NE calculations. In another implementation, for each group, compute the sum of weights of the same class at that West NE plus the sum of weights of the same class at the East NE, and select the group with the minimum sum. A threshold may be used to eliminate groups whose sum plus the weight of the new service exceeds the threshold
[0052] As an example for multicast services, a given West NE has four TEs to each East NE, paired into two multicast ELPS groups. For each multicast VLAN, the West NE and the spine act as normal IGMP proxy and IGMP snoop, respectively. The West NE and the spine have no requirement for additional multicast ELPS functionality. Each East NE will act as a 1+1 ELPS bridge with per-VLAN IGMP proxy. Upstream traffic will be broadcast from proxy function to both spines. Downstream traffic will be received by proxy function from the active TE. This results in a configuration where, during normal operation, IGMP and multicast traffic will be duplicated on the working and protect TE, and the multicast tables in the spines and the CP ports will be synchronized.
[0053]
[0054] In one embodiment, there is an OAM VLAN and a service VLAN. An OAM VLAN is associated with the working communication path and another OAM VLAN associated with the protection communication path. In this scenario, there is a 1:1 correspondence between an OAM VLAN and a MEG. The OAM VLAN provides for communication between the MEPS and the ELPS protection groups. The system protects the service VLAN. A network failure is detected by non-responsiveness of a MEP, which indicates that the communication between the MEPs over the OAM VLAN is interrupted. When a failure is detected, the service VLAN switches to the standby path (e.g., the protection path). The upstream service VLAN will continue to forward traffic to both the working and protection paths, but the downstream service VLAN traffic will be forwarded on the standby path, at the East NE. The physical path (e.g., a series of physical links) has VLANs which traversing it. The ELPS protections groups are configured to associate a specific OAM VLAN with the working path and another specific OAM VLAN with the protection path. The state of the ELPS protection group (e.g., designating which path is active and which is standby) determines whether to forward downstream service VLAN traffic received on the working path or the protection path.
[0055] As a an example for multicast services, when a network fault is detected on the active TE, the East NE switches downstream receive to the standby TE and sets the standby TE to the active TE. The East NE then continues to forward upstream to both spines. After the failure is resolved, the East NE will not revert unless failure occurs on the active TE (e.g., the standby TE to which downstream receive was switched). After the failure is resolved, the associated spine and West NEs will resynchronize their multicast tables through general membership queries. Additional protocols and processing are not required but may be provided. For multicast, because the East NE acts autonomously, APS is not needed for TE switching and East NEs unaffected by the network fault will not switch. This minimizes service disruption for unaffected OTLs and services.
[0056] As one example of assigning multicast services, in a network where each TE logically connects the West NE to every East NE via multicast replication, two trees are formed per West NE with the West NE as the root. When a new multicast VLAN is added to the system, the VLAN is assigned to one of the 2 ELPS groups which connects the West NE to the East NEs. The service is assigned to the ELPS group with the minimum sum of the weights of all multicast services of the same class. For example, setting the class and weight to one for all service types results in a round robin assignment of multicast services to pairs of paths, alternating between the two ELPS groups. Service assignment may be limited to groups where the CIR can be met following failover, however this is not required and the network configuration may be such that service assignment is not so limited. For instance, CIR may be oversubscribed by communication providers.
[0057] As one of skill in the art will appreciate, there are many possible algorithms to optimize service assignment to ELPS groups. For instance, for a given West NE/East NE pair where there are four ELPS groups, four possible algorithms are described. A random algorithm will randomly pick 1 of the 4 ELPS groups seeking uniform distribution of the number of services. A round robin algorithm will select the next ELPS group in a circular sequence [1, 2, 3, 4]. A minimum PIR algorithm (MinPIR) selects the ELPS group that has a minimum sum of PIR for the West NE to spine link associated with the working TE of that ELPS group. A minimum CIR algorithm (MinCIR) selects the ELPS group that has a minimum sum of CIR for the West NE to spine link associated with the working TE of that ELPS group. The West NE to spine link (e.g., 100G) of a TE generally has higher utilization than the spine to East NE link (e.g., 100G) because of the ratio of East NE links to West NE links. This may be because the same amount of unicast traffic flows from West NE to spine compared to spine to East NE.
[0058] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
[0059] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products or in a single hardware element or multiple hardware elements, or some combination thereof.
[0060] Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.