Rapid Network Redundancy Failover

20230188874 ยท 2023-06-15

    Inventors

    Cpc classification

    International classification

    Abstract

    Methods and systems for high speed failover in a network are provided. To provide faster Type C GPON redundancy failover, the disclosure herein describes the use of G.8031 1:1 ELPS in a single ended application to ensure path integrity through the network. Single ended 1:1 ELPS means that a network device is configured with 1:1 ELPS and switches paths in the event of disruption of the working communication path without the other underlying transport entities having knowledge of either the ELPS protocol or state machine. ELPS (Ethernet Linear Protection Switching, ITU G.8031) is a standardized method for protection switching between two point-to-point paths through a network, however its application here is quite novel. During a failure on the working path, traffic will switch over to the protection path. Type C PON protection provides a fully redundant path between the OLT and the ONU (2 separate PONs).

    Claims

    1. A method of communication resilience in a network, comprising: establishing a working communication path between an aggregation switch and a CPE, wherein the working communication path communicatively traverses an OLT and wherein a MEP of the aggregation switch is communicatively coupled with a MEP of the CPE; establishing a protection communication path between the aggregation switch or a second aggregation switch and the CPE, wherein the protection communication path traverses a second OLT and wherein a second MEP of the aggregation switch or the second aggregation switch is communicatively coupled with a second MEP of the CPE; wherein the CPEs transmit and receive data on the working communication path and monitor the protection communication path in a non-fault state; detecting a network fault on the working communication path based on non-responsiveness of the MEP of the aggregation switch or the MEP of the CPE; and responding to the network fault on the working communication path by promoting, at the aggregation switch, the protection communication path to an active state.

    2. The method of claim 1 wherein detecting a network fault on the working communication path based on non-responsiveness of the MEP of the aggregation switch or the MEP of the CPE further comprises: monitoring the working communication path using continuity check messages generated by the MEP of the aggregation switch.

    3. The method of claim 2 wherein the continuity check messages include status information about a local port and a physical interface.

    4. The method of claim 1 wherein the MEP of the CPE sends an RDI notification to the aggregation switch based on a determination that the CPE has detected a communication fault in the working communication path.

    6. The method of claim 1 wherein a communication path exists between each physical interface of aggregation switch and the OLT element.

    7. The method of claim 1 wherein promoting, at the aggregation switch, the protection communication path comprises: switching upstream traffic from the CPE to the aggregation switch from the working communication path to the protection communication path; learning a MAC address of a port coupled to the protection path at the CPE; sending downstream traffic from the aggregation switch to the port at the CPE.

    8. The method of claim 1 further comprising: sending, by the CPE, a gratuitous ARP containing IP address and MAC address information.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0018] FIG. 1 illustrates an example network architecture.

    [0019] FIG. 2 illustrates an example network architecture with an ONU communication fault.

    [0020] FIG. 3 illustrates an example network architecture with a PON communication fault.

    [0021] FIG. 4 illustrates an example network architecture with an OLT fault.

    [0022] FIG. 5 provides a flow chart for fault detection and failover.

    [0023] FIG. 6 provides a flow chart for fault detection and failover.

    [0024] FIG. 7 illustrates an example network architecture.

    [0025] Like reference numbers and designations in the various drawings indicate like elements.

    DETAILED DESCRIPTION

    [0026] Methods and systems for Rapid Type C GPON Redundancy Failover are discussed throughout this document. As will be discussed in more detail with reference to the figures, redundant communication paths exist between a CPE and a network. Network access is via one or more aggregation switches. These redundant communication paths can be viewed as an ELPS protection group whose connectivity is protected from network failures through the use of a unique form of Single Ended 1:1 ELPS processing. Unlike traditional ELPS processing, only one endpoint is directly involved in fault detection and there is no coordination between the two endpoints using APS messages. The network fault detection and rapid failover scheme described herein also decouples the control and data planes. The solution separates the control and data planes such that the control plane is monitored using the unique Single Ended 1:1 ELPS while the data plane uses ELAN resiliency. Thus, as is disclosed herein, the network can individually or collectively control failover, as appropriate. As one example, CCMs associated with one VLAN could detect network faults causing failovers in different VLANs.

    [0027] Per ELPS, between two network entities, traffic traverses one of two paths: a working path or a protection path. A given path has two states: active or standby. These two paths and their associated traffic and services, running on VLANs, form an ELPS (Ethernet linear protection switching) group. In normal operation, the traffic and services will traverse the working path, as it is active while the protection path is standby. However, in a fault state, the ELPS group fails over from the active working path such that its traffic and services now traverse the newly active protection path. The ELPS group may revert the active path to the working path when the failure has been corrected, however this is not required.

    [0028] As described in the standard, G.8031 1:1 ELPS uses selectors and bridges at upstream and downstream network elements (EAST and WEST endpoints) that are coordinated using state machines tracking the active and standby status of the working transport entity (TE) and the protection transport entity (TE). To detect faults on the working and protection TEs, CCM traffic is sent over both paths. When a fault is detected, APS packets are sent on the protection TE. For clarity, the term working TE and working path refer to the same element, and the term protection TE and protection path refer to the same element.

    [0029] G.8031 can be advantageously modified by replacing the selector and bridge at the WEST endpoint with an Ethernet switch. CCM messages are communicated on each of the working and protect paths to monitor path health. CCM endpoints detect network faults and determine the path fault domain. The Ethernet switch generates CCM messages on the working and the protect paths that will inform the CPE of their status and integrity, and ultimately allow it to make a decision as to which path to use. The EAST endpoints then choose which of the working or protect path should be designated the active path. Among the ports assigned to the working and protect paths, the active port of the WEST Ethernet switch is determined to be that port with a MAC address known to the system (e.g., through ARP tables, IP to MAC address mappings, etc.). Unlike G.8031 defined in the standard, no APS packets are used. In other words, this solution can be implemented independent of APS packets.

    [0030] In one implementation, the CPEs only transmit and receive on the active path while monitoring both paths using CCMs. The aggregation switches are agnostic to the ELPS group. However, the aggregation switches contain MEPs to generate CCMs to each CPE. The CPEs make the decision as to which path to use based on the CCMs received from the switch, and trigger path changes in the ELPS state machine accordingly. For instance, the absence of received CCM traffic on a MEP of the CPE indicates a network fault on that communication path. On failover, the aggregation switches relearn the traffic MAC addresses on the newly active path as the traffic starts to flow through it, such as through ARP messaging for management or through upstream data packets. Accordingly, rapid fault detection and failover can occur in many embodiments.

    [0031] Because the WEST endpoint does not use the ELPS protocol or state machine, it's functionality can be split between multiple aggregation switches, providing additional redundancy.

    [0032] As shown in FIG. 7, in some implementations, the network comprises a core network 710 and access networks 730. The core network 710 connects to access networks 730 through network gateways 720. Network edge nodes 740 connect the access networks 730 to a protection switch edge node 750. A host 760 connects to the networks 710, 730 through the protection switch edge node 750.

    [0033] As shown in FIG. 1, in some implementations, the network 100 comprises aggregation switches 110, OLTs 120, optical signal splitters 130, and CPEs 140 that are communicatively connected. A data plane VLAN connects an aggregation switch 110 to an uplink 150 to CPE 140 through both OLTs 120 and both aggregation switches 110. A control plane VLAN connects an aggregation switch 110 to the CPE 140 through one OLT 120. The CPEs 140 transmit and receive on a working (e.g., active) path 160 while monitoring both the working path 160 and a protect path 170. In some implementations, the working path 160 and the protect path 170 need not be the same across CPEs 140. For instance, a CPE 140 can have the working path 160 to an OLT 120 and the protect path 170 to a different OLT 120, whereas another CPE 140 may use the paths differently. Whether a path is working or protect is relevant to one CPE, individually; the system may be configured either way. As described, the aggregation switches 110 are unaware of the ELPS protection groups (e.g., the combination of working path 160 and protection path 170). The aggregation switches 110 contain MEPs to generate CCMs that are transmitted to each CPE 140. Each CPE 140 has the control logic executed, for example, by one or more processors or other data processing apparatus, to detect network faults and determine whether to use the working path 160 or the protection path 170 as the active path and whether a failover is necessary because the working path can no longer communicate due to a network fault. In the event failover is necessary, protection path 170 becomes the active path and working path 160 becomes the standby path. Aggregation switches 110 learn all of the CPE 140 MAC addresses on active paths (e.g., the working path 160) and send traffic on the data VLAN. If a failover occurs, the aggregation switches 110 relearn the MAC address of all communication paths subject to the failover (e.g., the protection path 170 made active).

    [0034] Still with respect to FIG. 1, in a no-fault state, all CPEs 140 transmit and receive data on the active (e.g. working) path 160. Any data received by a CPE 140 on the standby (e.g., protect) path 170 is discarded. CPEs 140 do not actively listen for data on the standby path 170. In a no-fault state, the CCMs are transmitted and received on the working path 160 and protect paths 170 (e.g., active and standby paths) that connect each aggregation switch 110 with every CPE 140. No CCMs or control plane traffic passes from one aggregation switch 110 to the other aggregation switch 110. CCMs do not traverse aggregation switches 110.

    [0035] Still with respect to FIG. 1, in a no-fault state, peer-to-peer (P2P) communications (e.g., communications originating at one CPE and destined for one or more other CPEs) are received at the OLT 120 from the CPE 140 on the active path 160. For unicast P2P traffic, the OLT 120 locally switches the traffic. For multicast P2P traffic, the OLT 120 multicasts the P2P traffic to CPEs 140 on its shelf (e.g., to CPEs coupled to it on working paths 160) and to the aggregation switch 110. The aggregation switches 110 pass the P2P multicast traffic on the data VLAN between the working path 160 and protect path 170. The P2P multicast traffic then traverses the protect paths 170 to the CPEs 140 but is discarded at the CPEs 140 because they do not listen for data on the standby path 170.

    [0036] FIG. 5 illustrates generally how communication paths are established and fault detection and failover occurs. A working communication path is established 510, and a protection communication path is established 520. The working communication path can be established between an aggregation switch and a CPE, and can communicatively traverse an OLT. A MEP of the aggregation switch can be communicatively coupled with a MEP of the CPE by the working communication path. The protection communication path can be established between the aggregation switch or a second aggregation switch and the CPE, and can communicatively traverse a second OLT. A second MEP of the aggregation switch or the second aggregation switch can be communicatively coupled with a second MEP of the CPE by the protection communication path. In a non-fault state, CPEs transmit and receive data on the working communication path and monitor the working and protection communication paths.

    [0037] An ELPS protection group is established 530 comprising the working path and the protection path. Communication proceeds over the ELPS protection group 540. As communication proceeds over the ELPS protection group 540, CCM traffic is monitored 550 for fault detection 560. A fault on a working path can be detected 560 based on the absence of CCM traffic at a MEP, of a CPE device, associated with the working path. For example, if CCM traffic persists, no fault is indicated 565. If CCM traffic is absent, a fault is detected 570. When a fault is detected 570, the CPE may send an RDI notification to the aggregation switch over the protect path. RDI notifications can be sent over either the working or the protect path, depending on which has the fault. When a fault is detected 570 on the working path, the protection path is promoted to an active state and becomes the active path for that ELPS protection group 580. Communications continue on that ELPS protection group 540 and CCM traffic continues to be monitored 550. For instance, once the protection path is made active, the CPE switches upstream traffic to the aggregation switch from the working communication path to the protection communication path. For downstream traffic, the aggregation switches learn a MAC address of a port coupled to the active path at the CPE. The aggregation switches may learn this MAC address by sending a gratuitous ARP message. The CPE sends a gratuitous ARP for the aggregation switch to learn its management MAC address. Upstream traffic flowing through the CPE causes the aggregation switch to learn other MAC addresses. Once the MAC address of the port coupled to the active path at the CPE is learned, the aggregation switches send downstream traffic on the active path to the port at the CPE.

    [0038] FIG. 6 illustrates generally how communication paths are established and fault detection and failover occurs when a fault is detected on both the active and the standby paths. A working communication path is established 610, and a protection communication path is established 620. The working communication path can be established between an aggregation switch and a CPE, and can communicatively traverse an OLT. A MEP of the aggregation switch can be communicatively coupled with a MEP of the CPE by the working communication path. The protection communication path can be established between the aggregation switch or a second aggregation switch and the CPE, and can communicatively traverse a second OLT. A second MEP of the aggregation switch or the second aggregation switch can be communicatively coupled with a second MEP of the CPE by the protection communication path. In a non-fault state, CPEs transmit and receive data on the working communication path and monitor the working and protection communication paths.

    [0039] An ELPS protection group is established 630 comprising the working path and the protection path. Communication proceeds over the ELPS protection group 640. As communication proceeds over the ELPS protection group 640, CCM traffic is monitored 650 for fault detection 560. A fault on both the active and the standby paths can be detected 660 based on the absence of CCM traffic at a MEP, of a CPE device, associated with the working path. For example, if CCM traffic persists, no fault is indicated 665. If CCM traffic is absent, a fault is detected 670. When a fault is detected 670, the CPE may send an RDI notification to the aggregation switch over the protect path. RDI notifications can be sent over either the working or the protect path, depending on which has the fault. When a fault is detected 670 on both the active and the standby paths, the working path is promoted to an active state and becomes the active path for that ELPS protection group 680. Communications continue on that ELPS protection group 640 and CCM traffic continues to be monitored 650. For instance, once the protection path is made active, the CPE switches upstream traffic to the aggregation switch from the working communication path to the protection communication path. For downstream traffic, the aggregation switches learn a MAC address of a port coupled to the active path at the CPE. The aggregation switches may learn this MAC address by sending a gratuitous ARP message. The CPE sends a gratuitous ARP for the aggregation switch to learn its management MAC address. Upstream traffic flowing through the CPE causes the aggregation switch to learn other MAC addresses. Once the MAC address of the port coupled to the active path at the CPE is learned, the aggregation switches send downstream traffic on the active path to the port at the CPE.

    [0040] As shown in FIG. 2, in a fault state where an optical signal splitter 210 loses connection with a CPE 220 (e.g., due to a fiber cut or other error), the CPE 220 recognizes that CCM traffic is down on an working path 230 (e.g., due to a lack of data being received over the working path 230), declares the active path 230 down (e.g., as having a fault), and fails over to the protect path 240 (e.g., by making the protect path 240 the active path). The aggregation switches 250, 255 learn (e.g., obtain) the MAC address of the CPE 220 interface on the newly designated active path 240. For P2P communications in such a fault state, the affected CPE 220 declares the working (e.g., active) path 230 down and starts transmitting and listening for data on the protect path 240 only (e.g., the active path after failover). As a result, the affected CPE 220 now receives the P2P traffic on the protect path 240. For unaffected CPEs 260, the active path 235 remains intact and these CPEs 260 continue to transmit and listen for data on the active path 235 only. The OLTs 270 continue to operate without regard for the fault condition. The aggregation switches 250, 255 continue to pass the P2P multicast traffic on the data VLAN between the working and the protect paths. The aggregation switches 250, 255 learn the MAC address of the interface terminating the new active path 240 at the affected CPE 220 and forward unicast P2P traffic to the affected CPE 220 on the new active path 240. For other network communications in this fault state, the aggregation switches 250, 255 continue to pass network traffic to unaffected CPEs 260 on the data VLAN. The aggregation switch 255 terminating the newly active path 240 learns the MAC address upstream traffic through CPE 220 terminating the newly active path 240. The CPE 220 has a management MAC address, but other traffic is flowing through the CPE 220. All of this traffic has different MAC source addresses. Aggregation switches 250, 255 learn all of these MAC addresses. The aggregation switch 250 terminating the faulty active path no longer passes network traffic on the faulty path 230.

    [0041] As shown in FIG. 3, in a fault state where the OLT 310 loses connection with an optical splitter 320 (e.g., due to a cable cut between the OLT 310 and the optical splitter 320), this affects all CPEs 330 with active paths 340 traversing the optical splitter 320. The affected CPEs 330 recognize that CCM traffic is down on these affected active paths 340, declare the affected active paths 340 down, and fail over (e.g., by making the protect paths 350 the new active paths). The aggregation switches 360, 365 learn the MAC addresses of the CPE 330 interfaces on the newly designated active paths 350. For P2P communications in such a fault state, the affected CPEs 330 declare the working (e.g., active) paths 340 down and start transmitting and listening for data on the protect paths 350 only (e.g., the active paths after failover). As a result, the affected CPEs 330 now receive the P2P traffic on their protect paths 350. For unaffected CPEs 335, the active path 345 remains intact and these CPEs 335 continue to transmit and listen for data on the active path 345 only. The OLTs 310, 315 continue to operate without regard for the fault condition. The aggregation switches 360, 365 continue to pass the P2P multicast traffic on the data VLAN between the working and the protect paths. The aggregation switches 360, 365 learn the MAC addresses of interfaces terminating the new active paths 350 at the affected CPEs 330 and forward unicast P2P traffic to the affected CPEs 330 on the new active paths 350. For other network communications in this fault state, the aggregation switches 360, 365 continue to pass network traffic to unaffected CPEs 330 on the data VLAN. The aggregation switch 365 terminating the newly active paths 350 learns the MAC addresses of upstream traffic through CPEs 330 terminating the newly active paths 350. The aggregation switch 360 terminating the faulty active paths 340 no longer passes network traffic on the faulty active paths 340.

    [0042] As shown in FIG. 4, in a fault state where an OLT 410 or an aggregation switch 420 fails, the CCMs are down for all communication paths 430 connecting the CPEs 440 to the failed equipment 410. The affected CPEs 440 declare the affected active paths 430 down and failover to standby paths 450 (e.g., protect paths). The unaffected aggregation switch 425 learns the MAC addresses of the CPE 440 interfaces on the newly designated active paths 450. For P2P communications in such a fault state, the affected CPEs 440 declare the working (e.g., active) path 430 down and start transmitting and listening for data on the protect (e.g., the active paths after failover) paths 450 only. As a result, the affected CPEs 440 now receive the P2P traffic on their protect paths 450. For other network communications in this fault state, the aggregation switches 420, 425 continue to pass network traffic to unaffected CPEs on the data VLAN. The aggregation switch 425 terminating the newly active paths 450 learns the MAC addresses of the physical interfaces of the CPEs 440 terminating the newly active paths 450. The aggregation switch 420 terminating the faulty active paths 430 no longer passes network traffic on the faulty active paths 430.

    [0043] CPE (customer-premises equipment) generally refers to devices such as telephones, routers, network switches, residential gateways, set-top boxes, fixed mobile convergence products, home networking adapters and Internet access gateways that enable consumers to access communication providers' services and distribute them in a residence or business over a local area network.

    [0044] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

    [0045] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products, or a single hardware product or multiple hardware products, or any combination thereof.

    [0046] Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.