REDUCING OR ELIMINATING ROUTING MICROLOOPS IN NETWORKS HAVING A CLOS TOPOLOGY, SUCH AS DATA CENTER CLOS NETWORKS EMPLOYING THE EXTERIOR BORDER GATEWAY PROTOCOL (EBGP) FOR EXAMPLE
20190363975 ยท 2019-11-28
Inventors
Cpc classification
International classification
Abstract
The problem of routing micro-loops in networks having a CLOS topology, such as data center CLOS networks employing the exterior border gateway protocol (eBGP) for example, is solved by: (a) receiving, on an interface of one of the nodes, a datagram, the datagram including destination information; (b) determining a next hop and an egress interface using (1) an identifier of the interface on which the datagram was received, (2) the destination information of the received datagram, and (3) stored forwarding information such that a routing micro-loop is avoided without discarding the datagram; and (c) forwarding the datagram via the egress interface. For example, this problem may be solved by (a) receiving, on an interface a node of the CLOS network, a datagram, the datagram including destination information; (b) looking up, using the destination information of the received datagram and stored forwarding information, a next hop egress interface on the node; (c) determining whether or not the next hop egress interface on the node is the same as the interface on which the datagram was received; and (d) responsive to a determination that the next hop egress interface on the node is the same as the interface on which the datagram was received, (1) replacing the next hop egress interface with a safe multipath next hop egress interface, and (2) forwarding the datagram via the safe multipath next hop egress interface, and otherwise, responsive to a determination that the next hop egress interface on the node is not the same at the interface on which the datagram was received, simply forwarding the datagram via the next hop egress interface.
Claims
1. A computer-implemented method for reducing or eliminating routing micro-loops in a network having a CLOS topology in which nodes of the network are arranged in at least three (3) tiers, the computer-implemented method comprising: a) receiving, on an interface of one of the nodes, a datagram, the datagram including destination information; b) determining a next hop and an egress interface using (1) an identifier of the interface on which the datagram was received, (2) the destination information of the received datagram, and (3) stored forwarding information such that a routing micro-loop is avoided without discarding the datagram; and c) forwarding the datagram via the egress interface.
2. The computer implemented method of claim 1 wherein the act of determining a next hop and an egress interface includes performing a single lookup using the destination information and the interface on which the datagram was received to select the egress interface and a next hop.
3. The computer implemented method of claim 1 wherein the act of determining a next hop and an egress interface includes 1) looking up, using the destination information of the received datagram and the stored forwarding information, a preliminary next hop egress interface on the node, 2) determining whether or not the preliminary next hop egress interface on the node is the same as the interface on which the datagram was received, and 3) responsive to a determination that the preliminary next hop egress interlace on the node is the same as the interface on which the datagram was received, replacing the preliminary next hop egress interface with a safe multipath next hop egress interface for use as the egress interface, and otherwise, responsive to a determination that the preliminary next hop egress interface on the node is not the same at the interface on which the datagram was received, using the preliminary next hop interface as the egress interface.
4. The computer-implemented method of claim 3 wherein it has been determined that the next hop egress interface on the node is the same as the interface on which the datagram was received, and wherein the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interfaces in such a manner that the plurality of safe next hop interfaces are selected with an even distribution.
5. The computer-implemented method of claim 3 wherein it has been determined that the next hop egress interface on the node is the same as the interface on which the datagram was received, and wherein the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interfaces in a round-robin manner.
6. The computer-implemented method of claim 3 wherein it has been determined that the next hop egress interface on the node is the same as the interface on which the datagram was received, and wherein the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interfaces randomly.
7. The computer-implemented method of claim 3 wherein it has been determined that the next hop egress interface on the node is the same as the interface on which the datagram was received, and wherein the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interfaces, each of which is stored on the node in association with a prefix matching the destination information.
8. The computer-implemented method of claim 1 wherein the destination information is a layer-3 destination address.
9. The computer-implemented method of claim 1 wherein the destination information is a layer-3 destination address of a server linked with at least one top-of-rack node of the network.
10. The computer-implemented method of claim 1 wherein the micro-loops are routing micro-loops between nodes of adjacent tiers in the CLOS network.
11. The computer-implemented method of claim 1 wherein the nodes of the network run a border gateway protocol (BGP).
12. The computer-implemented method of claim 1 wherein the nodes of the network run an exterior border gateway protocol (eBGP).
13. The computer-implemented method of claim 1 wherein the one of the nodes is located in a tier other than the third tier.
14. The computer-implemented method of claim 1 wherein the one of the nodes is located in a tier other than the top-of-rack tier.
15. A node provided in a network having a CLOS topology in which nodes of the network are arranged in at least three (3) tiers, the node comprising: a) an interface for receiving a datagram, the datagram including destination information; b) a storage medium storing forwarding information; and c) a forwarding engine configured to determine a next hop and an egress interface using (1) an identifier of the interface on which the datagram was received, (2) the destination information of the received datagram, and (3) the forwarding information stored, such that a routing micro-loop is avoided without discarding the datagram, and forward the datagram via the egress interface.
16. The node of claim 15 wherein the forwarding engine determines the next hop and the egress interface by performing a single lookup using the destination information and the interface on which the datagram was received to select the egress interface and the next hop.
17. The node of claim 15 wherein the forwarding engine determines the next hop and the egress interface by (1) looking up, using the destination information of the received datagram and the forwarding information stored, a preliminary next hop egress interface on the node, (2) determining whether or not the preliminary next hop egress interface on the node is the same as the interface on which the datagram was received, and (3) responsive to a determination that the preliminary next hop egress interface on the node is the same as the interface on which the datagram was received, replacing the preliminary next hop egress interface with a safe multipath next hop egress interlace for use as the egress interface, and otherwise, responsive to a determination that the preliminary next hop egress interface on the node is not the same at the interface on which the datagram was received, using the preliminary next hop interface as the egress interface.
18. The node of claim 17 wherein when the forwarding engine has been determined that the next hop egress interface on the node is the same as the interface on which the datagram was received, the forwarding engine is configured to replace the next hop egress interface with a safe multipath next hop egress interface by selecting one of a plurality of safe next hop interfaces in such a manner that the plurality of safe next hop interfaces are selected with an even distribution.
19. The node of claim 18 wherein the forwarding engine is configured to replace the next hop egress interface with a safe multipath next hop egress interface by either (A) selecting one of a plurality of safe next hop interfaces in a round-robin manner, or (B) selecting one of a plurality of safe next hop interfaces randomly.
20. The node of claim 15 wherein the destination information is a layer-3 destination address of a server linked with at least one top-of-rack node of the network.
21. A non-transitory computer-readable medium storing processor executable instructions which, when executed by at least one processor, cause the at least one processor to perform a method for reducing or eliminating routing micro-loops in a network having a CLOS topology in which nodes of the network are arranged in at least three (3) tiers, the method comprising: a) receiving, on an interface of one of the nodes, a datagram, the datagram including destination information; b) determining a next hop and an egress interface using (1) an identifier of the interface on which the datagram was received, (2) the destination information of the received datagram, and (3) stored forwarding information, such that a routing micro-loop is avoided without discarding the datagram; and c) forwarding the datagram via the egress interface.
Description
3. BRIEF DESCRIPTION OF THE DRAWINGS
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
4. DETAILED DESCRIPTION
[0044] The present disclosure may involve novel methods, apparatus, message formats, and/or data structures for reducing or eliminating routing micro-loops in networks having a CLOS topology, such as data center CLOS networks employing the exterior border gateway protocol (eBGP) for example. The following description is presented to enable one skilled in the art to make and use the described embodiments, and is provided in the context of particular applications and their requirements. Thus, the following description of example embodiments provides illustration and description, but is not intended to be exhaustive or to limit the present disclosure to the precise form disclosed. Various modifications to the disclosed embodiments will be apparent to those skilled in the art, and the general principles set forth below may be applied to other embodiments and applications. For example, although a series of acts may be described with reference to a flow diagram, the order of acts may differ in other implementations when the performance of one act is not dependent on the completion of another act. Further, non-dependent acts may be performed in parallel. No element, act or instruction used in the description should be construed as critical or essential to the present description unless explicitly described as such. Also, as used herein, the article a is intended to include one or more items. Where only one item is intended, the term one or similar language is used. Thus, the present disclosure is not intended to be limited to the embodiments shown and the inventor regards his invention as any patentable subject matter described.
4.1 DEFINITIONS
[0045] The following terms may be used in this disclosure.
[0046] BGP speaker: A router that implements BGP.
[0047] Exterior BGP (or eBGP): A BGP connection between external peers (that is, peers within different ASes).
[0048] Forwarding Information Base (or FIB): A data structure used to forward a received (ingress) packet towards its destination by determining a next hop.
[0049] Interior BGP (or iBGP): A BGP connection between internal peers (that is, peers within the same AS).
[0050] Interior Gateway Protocol (or IGP): A routing protocol used to exchange routing information among routers within a single Autonomous System (AS).
[0051] Next Hop: A next node (e.g., switch or router) to which a packet is sent from any given router as it traverses a network on its way to its final destination.
[0052] Prefix: part of an address that defines part of a communications network (e.g., a subnetwork), such as an Internet Protocol (IP) network for example.
[0053] Route: A unit of information that pairs a set of destinations with the attributes of a path to those destinations. The set of destinations are systems whose IP addresses are contained in one IP address prefix.
[0054] RIB: Routing Information Base.
4.2 EXAMPLE METHODS
[0055]
[0056] In response to receiving the datagram, the example method 500 determines a next hop and an egress interface using (1) an identifier of the interface on which the datagram was received, (2) the destination information of the received datagram, and (3) stored forwarding information such that a routing micro-loop is avoided without discarding the datagram (Block 520) That is, the example method 500 determines a next hop and an egress interface such that the egress interface is not the same as the interface on which the datagram was received. The method 500 then forwards the datagram via the egress interface (Block 530) before the method 500 is left (Node 540).
[0057]
[0058] In response to receiving the datagram, the example method 500 looks up, using the destination information of the received datagram and stored forwarding information, a next hop egress interface on the node. (Block 522) Note that the stored forwarding information may include, for example, a plurality of entries, each including a destination address prefix, a next hop node, a next hop egress interface and one or more safe multipath next hop interfaces.
[0059] The example method 500 then determines whether or not the next hop egress interface on the node is the same as the interface on which the datagram was received. (Decision 524) Responsive to a determination that the next hop egress interface on the node is the same as the interface on which the datagram w as received (Decision 524, YES), the example method 500 (1) replaces the next hop egress interface with a safe multipath next hop egress interface (Block 526), and (2) forwards the datagram via the safe multipath next hop egress interface (Block 532), before the example method 500 is left (Node 540). Otherwise, responsive to a determination that the next hop egress interface on the node is not the same at the interface on which the datagram was received (Decision 524, NO), the example method 500 forwards the datagram via the next hop egress interface (Block 534) before the example method 500 is left (Node 540).
[0060] In some example methods, the destination information is a layer-3 destination address. For example, the destination address may be a layer-3destination address (e.g., Internet Protocol version 4 (IPv4), or Internet Protocol version 6 (IPv6) address) of a server linked with at least one top-of-rack node of the network.
[0061] In some example methods, the loops are routing micro-loops between nodes of adjacent tiers in the CLOS network.
[0062] In some example methods, the nodes of the network run a border gateway protocol (BGP), such as an exterior border gateway protocol (eBGP) for example
[0063] In some example methods, the node executing the method is located in a tier other than the third tier, such as in a tier other than the top-of-rack tier for example.
[0064] In some example methods, the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interfaces in such a manner that the plurality of safe next hop interfaces are selected with an even distribution. In some example methods, the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interlaces in a round-robin manner In some other example methods, the act of replacing the next hop egress interface with a safe multipath next hop egress interface includes selecting one of a plurality of safe next hop interfaces randomly. In some example methods, each of the plurality of safe next hop interfaces is stored on the node in association with a prefix matching the destination information.
4.2.1 Example of Operations of Example Method
[0065] A simple example serving to illustrate operations of the example method 500 is now described with reference to
[0066] Consider routing for a datagram (e.g., a packet) destined for a server linked to TOR node T4. Referring to both
[0067] Referring to both
[0068] Referring to both
[0069] Finally, referring to both
[0070] Next, handling failures with fast reroute (FRR) using multipath, assuming T4 is the destination, is illustrated with reference to
[0071] Consider first a failure in the link between T1 and E1. Referring to
[0072] Now consider a failure in the link between E1 and S1. Referring to
[0073] Next, consider a failure in the link between S1 and E4. Referring to
[0074] Finally, consider a failure in the link between E3 and T4. Referring to
[0075] Consider, however, the potential problem of a routing micro-loop in this example. More specifically, if the link f between nodes E3 and T4 is down, node E3 may send a datagram destined for T4 back up to any one of the spine nodes S1-S4. However, it is possible that the receiving spine node S1, S2, S3, or S4 will send the datagram right back to node E3, causing a routing micro-loop. Applying the example method 500 of
[0076] As can be appreciated by this simple example, example methods consistent with the present description can be used to avoid micro-loops in a CLOS network, such as a CLOS network used in a data center.
4.3 EXAMPLE APPARATUS
[0077]
[0078] As just discussed above, and referring to
[0079] The control component 910 may include an operating system (OS) kernel 920, routing protocol process(es) 930, label-based forwarding protocol process(es) 940, interface process(es) 950, user interface (e.g., command line interface) process(es) 960, and chassis process(es) 970, and may store routing table(s) 939, label forwarding information 945, and forwarding (e.g., route-based and/or label-based) table(s) 980. As shown, the routing protocol process(es) 930 may support routing protocols such as the routing information protocol (RIP) 931, the intermediate system-to-intermediate system protocol (IS-IS) 932, the open shortest path first protocol (OSPF) 933, the enhanced interior gateway routing protocol (EIGRP) 934 and the border gateway protocol (BGP) 935, and the label-based forwarding protocol process(es) 940 may support protocols such as BGP 935, the label distribution protocol (LDP) 936 and the resource reservation protocol (RSVP) 937. One or more components (not shown) may permit a user 965 to interact with the user interface process(es) 960. Similarly, one or more components (not shown) may permit an outside device to interact with one or more of the router protocol process(es) 930, the label-based forwarding protocol process(es) 940, the interface process(es) 950, and the chassis process(es) 970, via SNMP 985, and such processes may send information to an outside device via SNMP 985.
[0080] The packet forwarding component 990 may include a microkernel 992, interface process(es) 993, distributed ASICs 994, chassis process(es) 995 and forwarding (e.g., route-based and/or label-based) table(s) 996.
[0081] In the example router 900 of
[0082] Still referring to
[0083] Referring to the routing protocol process(es) 930 of
[0084] Still referring to
[0085] The example control component 910 may provide several ways to manage the router. For example, it 910 may provide a user interface process(es) 960 which allows a system operator 965 to interact with the system through configuration, modifications, and monitoring. The SNMP 985 allows SNMP-capable systems to communicate with the router platform. This also allows the platform to provide necessary SNMP information to external agents. For example, the SNMP 985 may permit management of the system from a network management station running software, such as Hewlett-Packard's Network Node Manager (HP-NNM), through a framework, such as Hewlett-Packard's OpenView. Accounting of packets (generally referred to as traffic statistics) may be performed by the control component 910, thereby avoiding slowing traffic forwarding by the packet forwarding component 990.
[0086] Although not shown, the example router 900 may provide for out-of-band management, RS-232 DB9 ports for serial console and remote management access, and tertiary storage using a removable PC card. Further, although not shown, a craft interface positioned on the front of the chassis provides an external view into the internal workings of the router. It can be used as a troubleshooting tool, a monitoring tool, or both. The craft interface may include LED indicators, alarm indicators, control component ports, and/or a display screen. Finally, the craft interface may provide interaction with a command line interface (CLI) 960 via a console port, an auxiliary port, and/or a management Ethernet port
[0087] The packet forwarding component 990 is responsible for properly outputting received packets as quickly as possible. If there is no entry in the forwarding table for a given destination or a given label and the packet forwarding component 990 cannot perform forwarding by itself, it 990 may send the packets bound for that unknown destination off to the control component 910 for processing. The example packet forwarding component 990 is designed to perform Layer 2 and Layer 3 switching, route lookups, and rapid packet forwarding.
[0088] As shown in
[0089] In the example router 900, the example method 500 consistent with the present disclosure may be implemented in the packet forwarding component 990.
[0090] Referring back to distributed ASICs 994 of
[0091] Still referring to
[0092] An FPC 1020 can contain from one or more PICs 1010, and may carry the signals from the PICs 1010 to the midplane/backplane 1030 as shown in
[0093] The midplane/backplane 1030 holds the line cards. The line cards may connect into the midplane/backplane 1030 when inserted into the example router's chassis from the front. The control component (e.g., routing engine) 910 may plug into the rear of the midplane/backplane 1030 from the rear of the chassis. The midplane/backplane 1030 may carry electrical (or optical) signals and power to each line card and to the control component 910.
[0094] The system control board 1040 may perform forwarding lookup. It 1040 may also communicate errors to the routing engine. Further, it 1040 may also monitor the condition of the router based on information it receives from sensors. If an abnormal condition is detected, the system control board 1040 may immediately notify the control component 910.
[0095] Referring to
[0096] The I/O manager ASIC 1022 on the egress FPC 1020/920 may perform some value-added services. In addition to incrementing time to live (TTL) values and re-encapsulating the packet for handling by the PIC 1010, it can also apply class-of-service (CoS) rules. To do this, it may queue a pointer to the packet in one of the available queues, each having a share of link bandwidth, before applying the rules to the packet. Queuing can be based on various rules. Thus, the I/O manager ASIC 1022 on the egress FPC 1020/920 may be responsible for receiving the blocks from the second DBM ASIC 1035b, incrementing TTL values, queuing a pointer to the packet, if necessary, before applying CoS rules, re-encapsulating the blocks, and sending the encapsulated packets to the PIC I/O manager ASIC 1015.
[0097]
[0098] Referring back to block 1270, the packet may be queued. Actually, as stated earlier with reference to
[0099] Referring back to block 1280 of
[0100] Referring back to block 1250 of
[0101] Although example embodiments consistent with the present disclosure may be implemented on the example routers of
[0102]
[0103] In some embodiments consistent with the present disclosure, the processors 1310 may be one or more microprocessors and/or ASICs. The bus 1340 may include a system bus. The storage devices 1320 may include system memory, such as read only memory (ROM) and/or random access memory (RAM). The storage devices 1320 may also include a hard disk drive for reading from and writing to a hard disk, a magnetic disk drive for reading from or writing to a (e.g., removable) magnetic disk, an optical disk drive for reading from or writing to a removable (magneto-) optical disk such as a compact disk or other (magneto-) optical media, or solid-state non-volatile storage.
[0104] Some example embodiments consistent with the present disclosure may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may be non-transitory and may include, but is not limited to, flash memory, optical disks, CD-ROMs, DVD ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards or any other type of machine-readable media suitable for storing electronic instructions. For example, example embodiments consistent with the present disclosure may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of a communication link (e.g., a modem or network connection) and stored on a non-transitory storage medium. The machine-readable medium may also be referred to as a processor-readable medium.
[0105] Example embodiments consistent with the present disclosure (or components or modules thereof) might be implemented in hardware, such as one or more field programmable gate arrays (FPGAs), one or more integrated circuits such as ASICs, one or more network processors, etc. Alternatively, or in addition, embodiments consistent with the present disclosure (or components or modules thereof) might be implemented as stored program instructions executed by a processor. Such hardware and/or software might be provided in an addressed data (e.g., packet, cell, etc.) forwarding device (e.g., a switch, a router, etc.), a laptop computer, desktop computer, a tablet computer, a mobile phone, or any device that has computing and networking capabilities.
4.4 REFINEMENTS AND ALTERNATIVES
[0106] Although described as routers and/or switches, nodes may represent other types of devices capable of performing the foregoing node operations.
[0107] Referring back to block 520 of
4.5 CONCLUSION
[0108] As should be appreciated from the foregoing, example methods and apparatus consistent with the present disclosure reduce or eliminate routing micro-loops in networks having a CLOS topology, such as data center CLOS networks employing the exterior border gateway protocol (eBGP) for example.