Multi-chassis link aggregation learning on standard ethernet links
10250489 ยท 2019-04-02
Assignee
Inventors
Cpc classification
Y02D30/50
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
A stacked switch packet communication system is connected to a Multi-Chassis Link Aggregation Group (MLAG). Devices in the system include a designated device for receiving packets that are destined for the MLAG. A new MLAG device is enabled while continuing packet communication by identifying an address of a single port in the new MLAG device. In first updates of the devices the single port is established in the forwarding databases of the devices and the packets transmitted through the devices to the single port. Thereafter, in second updates the single port is replaced in the forwarding databases by another port of the new MLAG device. Upon completion of respective second updates, the packets are transmitted through the devices to the other port in the MLAG.
Claims
1. A method, comprising the steps of: connecting a stacked switch system to a Multi-Chassis Link Aggregation Group (MLAG), the system comprising a set of devices for communication of data packets, wherein the devices each have a plurality of physical ports and a forwarding database, the devices including a designated device for receiving ones of the packets destined for the MLAG, the set of devices having spine devices and leaf devices; enabling a new MLAG device; and while communicating the packets through the stacked switch system: identifying an address of a single port in the new MLAG device and in first updates of the devices establishing the single port in the forwarding database of each of the devices; and transmitting the packets through the devices to the single port; and thereafter in second updates of the devices replacing the single port by another port in the new MLAG device in the forwarding database of each of the devices; and upon completing each of the second updates transmitting the packets through the devices to the other port in the MLAG.
2. The method according to claim 1, further comprising in the first updates and the second updates updating the forwarding database of each of the devices in order of respective distances thereof from the MLAG.
3. The method according to claim 1, further comprising defining a tree having a root comprising the designated device, and updating the forwarding database of each of the devices by visiting the devices in a breadth-first search (BRS) of the tree.
4. The method according to claim 1, further comprising defining a tree having a root comprising the designated device, and updating the forwarding database of each of the devices by visiting the spine devices first and then the leaf devices in a traversal of the tree.
5. The method according to claim 1, wherein the address of the new MLAG device is a Media Access Control (MAC) address.
6. The method according to claim 1, wherein updating the forwarding database in the first updates and the second updates comprises updating an egress port of each of the devices.
7. An apparatus, comprising: a stacked switch system connected to a Multi-Chassis Link Aggregation Group (MLAG), the system comprising a stack controller and a set of devices for communication of data packets, wherein each of the devices has a plurality of physical ports and a forwarding database, the devices including a designated device for receiving ones of the packets destined for the MLAG, the set of devices having spine devices and leaf devices, wherein the stack controller is operative for transmitting control signals to the devices to enable a new MLAG device and wherein each of the devices is operative, responsively to the control signals and while communicating the packets through the stacked switch system, for: identifying an address of a single port in the new MLAG device and in first updates of the devices establishing the single port in its forwarding database; and transmitting the packets through others of the devices to the single port; and thereafter in second updates of the devices replacing the single port by another port in the new MLAG device in its forwarding database; and upon completing each of the second updates transmitting the packets through the devices to the other port in the MLAG.
8. The apparatus according to claim 7, wherein in the first updates and the second updates the forwarding database of each of the devices is updated in order of respective distances thereof from the MLAG.
9. The apparatus according to claim 7, wherein each of the devices is operative for updating its forwarding database in a breadth-first search of a tree having a root comprising the designated device.
10. The apparatus according to claim 7, wherein each of the devices is operative for updating its forwarding database by visiting the spine devices first and then the leaf devices in a traversal of a tree having a root comprising the designated device.
11. The apparatus according to claim 7, wherein the address of the new MLAG device is a Media Access Control (MAC) address.
12. The apparatus according to claim 7, wherein updating the forwarding database in the first updates and the second updates comprises updating an egress port of each of the devices.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1) For a better understanding of the present invention, reference is made to the detailed description of the invention, by way of example, which is to be read in conjunction with the following drawings, wherein like elements are given like reference numerals, and wherein:
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF THE INVENTION
(6) In the following description, numerous specific details are set forth in order to provide a thorough understanding of the various principles of the present invention. It will be apparent to one skilled in the art, however, that not all these details are necessarily always needed for practicing the present invention. In this instance, well-known circuits, control logic, and the details of computer program instructions for conventional algorithms and processes have not been shown in detail in order not to obscure the general concepts unnecessarily.
(7) Documents incorporated by reference herein are to be considered an integral part of the application except that, to the extent that any terms are defined in these incorporated documents in a manner that conflicts with definitions made explicitly or implicitly in the present specification, only the definitions in the present specification should be considered.
(8) Overview.
(9) Turning now to the drawings, reference is now made to
(10) In the pictured embodiment, decision logic 14 receives packets 16, each containing a header 18 and payload data 20. A processing pipeline 22 in decision logic 14 extracts a classification key from each packet, typically (although not necessarily) including the contents of certain fields of header 18. For example, the key may comprise the source and destination addresses and ports and a protocol identifier. Pipeline 22 matches the key against a matching database 24 containing a set of rule entries, which is stored in an SRAM 26 in network element 10, as described in detail hereinbelow. SRAM 26 also contains a list of actions 28 to be performed when a key is found to match one of the rule entries. For this purpose, each rule entry typically contains a pointer to the particular action that decision logic 14 is to apply to packets 16 in case of a match. Pipeline 22 typically comprises dedicated or programmable hardware logic, which is configured to carry out the functions described herein.
(11) In addition, network element 10 typically comprises a cache 30, which contains rules that have not been incorporated into the matching database 24 in SRAM 26. Cache 30 may contain, for example, rules that have recently been added to network element 10 and not yet incorporated into the data structure of matching database 24, and/or rules having rule patterns that occur with low frequency, so that their incorporation into the data structure of matching database 24 would be impractical. The entries in cache 30 likewise point to corresponding actions 28 in SRAM 26. Pipeline 22 may match the classification keys of all incoming packets 16 against both matching database 24 in SRAM 26 and cache 30. Typically, when there is a cache miss in cache 30, database 24 is addressed to determine if a given classification key matches any of the rule entries in database 24 MLAG Traffic Flow.
(12) When a MLAG stack is based on standard Ethernet links, each device performs standard layer 2 (bridge) forwarding. When a new MAC on a MLAG is learned it is virtually impossible complicated to update all FDBs of all the switches on the stack at the same time unless specialized hardware is provided for that purpose. The transition time, in which some of the switches have learned the new MAC while others have not, can lead to undesirable cases where a packet is either received multiple times on the MLAG or is not received at all by the MLAG. This behavior occurs when some switches perform unicast forwarding while others preform flood forwarding of an unknown packet. Controlling the learning order of the new MAC on the switches does not always resolve the problem.
(13) Reference is now made to
(14) In this example MLAG 32 comprises two linked stack groups 36, 38. Assume that a switch 40 in stack group 36 has just come on line and that the MAC of switch 40 is not yet known to the other switches X, Y, Z, X1, Y1, Z1 in the stacked switch system 34. All the devices in the stacked switch system 34 are configured to flood BUM packets to all devices in the stacked switch system 34.
(15) In general BUM traffic should be forward to all switch interfaces in a MLAG system in case a MLAG interface, e.g. MLAG 32 is built for more than one device. In the event that each device member in the MLAG floods the BUM traffic to its local ports the MLAG interface will receive one copy per device member in the MLAG. One method for preventing BUM traffic duplication is to select, for each MLAG interface, a single device, known as the BUM-designated forwarder. The single device forwards the BUM traffic.
(16) Accordingly, a packet is not forwarded to the MLAG 32 by an egress device in the stacked switch system 34 unless it has been designated to do so. Switch Y is the designated switch for flooding traffic to the MLAG 32 via link H3. Thus, switch Y can forward packets to the MLAG 32, but switch Y1 cannot, even though both switches Y, Y1 share the link H3 leading to the MLAG 32.
(17) Assume that a BUM packet is sent from link H1 to link H3. Link H3 is connected to the stacked switch system of MLAG 32. The MAC that can be reached via link H3 is not known to any of the switches. The flooding of the BUM packet is represented by a broken line extending from link H1 to the other switches Y, Z, X1, Y1, Z1 of the stacked switch system 34.
(18) Reference is now made to
Problem to be Solved
(19) Reference is now made to
(20) 1. The packet is not sent to the MLAG.
(21) 2. The packet is sent twice to the MLAG.
(22) In this example assume a packet is forwarded from link H5 to link H3 where the destination address is on the MLAG 32 and the stacked switch system 34 is in a process of learning a new MAC address in the MLAG 32. Switch Y is designated for flooding to the MLAG 32. Two learning orders are discussed:
(23) Learning Order 1. Spine switches learn MAC addresses first; then leaf switches learn: 1. Spine switch Z1 performs unicast forwarding to Switch Y1, link 1. (the packet does not reach switch Z because it is not on the optimum path to the MLAG 32). 2. Leaf switch Y1 performs flooding. However it does not forward the packet to the MLAG 32 because it is not the designated switch for flooding for this MLAG.
(24) Result: The packet does not reach the MLAG 32.
(25) Learning Order 2. Leaf switches learn MAC addresses first; then the spine switches learn. Switch Z1 performs flooding. Switch Y1 performs unicast forwarding to MLAG 32 (copy #1). Leaf switch Y1, having learned the MAC addresses of the MLAG 32, does not need to flood packets to MLAG 32; it can forward them directly to the MLAG 32 via link H3. Spine Switch Z receives the packet from switch Z1 and also performs flooding, because it has not yet learned the MAC addresses. Leaf switch Y has learned the MAC address and performs unicast forwarding to the MLAG 32 (copy #2)
(26) Result: The packet is forwarded twice to the MLAG 32.
Solution
(27) According to an embodiment of the invention, the problem outlined above is solved by learning MAC addresses in two phases. The strategy is as follows:
(28) Phase 1: Learn the MAC of a single port of a single switch device on the MLAG in all devices (no local port). The single switch device must be the designated BUM device, e.g., in
(29) This procedure ensures that the new MAC on the MLAG will always receive one and only one instance of a packet during a transitional period in which not all the FDBs are fully synchronized to accommodate the new MAC.
(30) Reverting to the example
(31) Phase 1.
(32) Learning Order 1. Spine switches learn MAC addresses first; then leaf switches learn. The learning order is accomplished by a BFS: Switch Z1 Performs unicast forwarding to switch Z. Switch Z performs unicast forwarding to switch Y. Switch Y performs unicast forwarding. Since switch Y is the designated port of the MLAG 32, traffic is forwarded to MLAG 32.
(33) Result: A single copy of the packet is forwarded to the MLAG 32 from port 2 of switch Y via link H3.
(34) Learning Order 2. Leaf switches learn MAC addresses first; then the spine switches learn. Switch Z1 performs flooding Switch Y1 performs unicast forwarding. However the forwarding decision is to link H3. Therefore, the packet is source-filtered, i.e., it never transits link H3, since switch Y1 is not the designated MLAG device. Switch Z performs flooding. Switch Y1 performs unicast forwarding to the MLAG 32.
(35) Result: A single copy is forwarded to the MLAG 32.
(36) Phase 2.
(37) Update FDB's of the switches (X, Y, Z, X1, Y1, Z1) in order of distance from the MLAG 32.
(38) 1. Select egress port of switch for route leading to MLAG 32. This can be accomplished, for example, by executing a known load-balancing algorithm. For example, Port 1 would typically be selected for the switch Y1. Similarly, port 1 would probably be selected for the switch Z1, as the path Z1.fwdarw.Y1.fwdarw.H3 is shorter than the alternative path Z1.fwdarw.Z.fwdarw.Y.fwdarw.H3.
(39) 2. Update the FDB of the switch to indicate the selected egress port. Because of the FDB update order, a packet arriving from a higher level of the tree cannot be misdirected to a longer path than the path defined by the FDB a lower level switch.
(40) It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof that are not in the prior art, which would occur to persons skilled in the art upon reading the foregoing description.