METHOD FOR IDENTIFYING FLOW, AND APPARATUS
20230155947 · 2023-05-18
Inventors
Cpc classification
H04L47/25
ELECTRICITY
H04L47/31
ELECTRICITY
H04L47/32
ELECTRICITY
International classification
H04L47/31
ELECTRICITY
Abstract
The technology of this application relates to a method for identifying a flow, where a communication device counts a received packet in a first filtering manner. When determining that a quantity of the received packets of the target flow is greater than or equal to a first threshold, the communication device marks, starting from a packet that exceeds the first threshold, a packet whose count is a multiple of m. When determining that a quantity of the received packets of the target flow is greater than or equal to a second threshold, the communication device counts, in a second filtering manner, a packet that is continuously received. When determining that a quantity of the received packets of the target flow is greater than or equal to a third threshold λ.sub.4, the communication device determines that the target flow is an elephant flow, and marks a packet that is continuously received.
Claims
1. A method for identifying a flow, comprising: counting, by a communication device, received packets in a first filtering manner, wherein the counting starts from an initial packet of a target flow; and marking, by the communication device, when determining that a quantity of the received packets of the target flow is greater than or equal to a first threshold, starting from a packet that exceeds the first threshold, a packet whose count is a multiple of m, wherein m is an integer greater than or equal to 2; or when determining that the quantity of the received packets of the target flow is greater than or equal to a second threshold, counting, by the communication device in a second filtering manner, a packet that is continuously received; or when determining that the quantity of the received packets of the target flow is greater than or equal to a third threshold, determining, by the communication device, that the target flow is an elephant flow, and marking the packet that is continuously received.
2. The method according to claim 1, further comprising: when determining that network congestion occurs, updating, by the communication device, a forwarding rate of the marked packet, wherein an updated forwarding rate is less than a non-updated forwarding rate; or when determining that network congestion occurs, discarding, by the communication device, the marked packet in the target flow.
3. The method according to claim 1, wherein a data structure used in the first filtering manner is a Bloom filter; and counting the received packets in the first filtering manner comprises: mapping, by using a Hash function and a vector that correspond to the Bloom filter, the received packets of the target flow to a vector location corresponding to the target flow in a one-to-one manner, wherein a count of the corresponding vector location during each mapping increases by 1.
4. The method according to claim 1, wherein marking the packet whose count is a multiple of m comprises: when determining that the quantity of the received packets of the target flow is equal to the first threshold, determining, by the communication device, that the target flow is a candidate elephant flow; and marking, starting from the packet that exceeds the first threshold, a packet whose count is a multiple of 2.sup.p, wherein m=2.sup.p, and p is an integer greater than or equal to 1.
5. The method according to claim 1, wherein a data structure used in the second filtering manner is a sketch data structure; and counting, in the second filtering manner, the packet that is continuously received comprises: mapping, in the second filtering manner, the packet that is continuously received to w×d counters by using d Hash functions, and determining a minimum value in counting results of the w.sup.d counters as a quantity of currently received packets, wherein w and d are integers greater than 1.
6. The method according to claim 1, further comprising: updating, by the communication device, the third threshold based on accuracy of receiving a packet by the communication device and a false positive rate.
7. A communication device, comprising: a processor; and a memory configured to store computer readable instructions that, when executed by the processor, cause the communication device to: count, starting from an initial packet of a target flow, received packets in a first filtering manner; and when it is determined that a quantity of the received packets of the target flow is greater than or equal to a first threshold, mark, starting from a packet that exceeds the first threshold, a packet whose count is a multiple of m, wherein m is an integer greater than or equal to 2; or when it is determined that the quantity of the received packets of the target flow is greater than or equal to a second threshold, count, in a second filtering manner, a packet that is continuously received; or when it is determined that the quantity of the received packets of the target flow is greater than or equal to a third threshold, determine that the target flow is an elephant flow, and mark the packet that is continuously received.
8. The communication device according to claim 7, the communication device is further caused to: when it is determined that network congestion occurs, update a forwarding rate of the marked packet, wherein an updated forwarding rate is less than a non-updated forwarding rate; or when it is determined that network congestion occurs, discard the marked packet in the target flow.
9. The communication device according to claim 7, wherein a data structure used in the first filtering manner is a Bloom filter; and the communication device is further caused to: map, by using a Hash function and a vector that correspond to the Bloom filter, the received packets of the target flow to a vector location corresponding to the target flow in a one-to-one manner, wherein a count of the corresponding vector location during each mapping increases by 1.
10. The communication device according to claim 7, wherein the communication device is further caused to: when it is determined that the quantity of the received packets of the target flow is equal to the first threshold, determine that the target flow is a candidate elephant flow; and mark, starting from the packet that exceeds the first threshold, a packet whose count is a multiple of 2.sup.p, wherein m=2.sup.p, and p is an integer greater than or equal to 1.
11. The communication device according to claim 7, wherein a data structure used in the second filtering manner is a sketch data structure; and the communication device is further caused to: map, in the second filtering manner, the packet that is continuously received to w×d counters by using d Hash functions, and determine a minimum value in counting results of the w.sup.d counters as a quantity of currently received packets, wherein w and d are integers greater than 1.
12. The communication device according to claim 7, wherein the communication device is further caused to: update the third threshold based on accuracy of receiving a packet by the communication device and a false positive rate.
13. A non-transitory computer-readable storage medium having computer readable instructions that, when executed by a processor of a communication device, cause the communication device to provide execution comprising: counting, starting from an initial packet of a target flow, received packets in a first filtering manner; and marking, when determining that a quantity of the received packets of the target flow is greater than or equal to a first threshold, starting from a packet that exceeds the first threshold, a packet whose count is a multiple of m, wherein m is an integer greater than or equal to 2; or when determining that the quantity of the received packets of the target flow is greater than or equal to a second threshold, counting, in a second filtering manner, a packet that is continuously received; or when determining that the quantity of the received packets of the target flow is greater than or equal to a third threshold, determining, that the target flow is an elephant flow, and marking the packet that is continuously received.
14. The non-transitory computer-readable storage medium according to claim 13, wherein when determining that network congestion occurs, updating, by the communication device, a forwarding rate of the marked packet, wherein an updated forwarding rate is less than a non-updated forwarding rate; or when determining that network congestion occurs, discard, by the communication device, the marked packet in the target flow.
15. The non-transitory computer-readable storage medium according to claim 13, wherein a data structure used in the first filtering manner is a Bloom filter; and counting, starting from the initial packet of the target flow, the received packets in the first filtering manner comprises: mapping, by using a Hash function and a vector that correspond to the Bloom filter, the received packets of the target flow to a vector location corresponding to the target flow in a one-to-one manner, wherein a count of the corresponding vector location during each mapping increases by 1.
16. The non-transitory computer-readable storage medium according to claim 13, wherein marking the packet whose count is the multiple of m comprises: when determining that the quantity of the received packets of the target flow is equal to the first threshold, determining that the target flow is a candidate elephant flow; and marking, starting from the packet that exceeds the first threshold, a packet whose count is a multiple of 2.sup.p, wherein m=2.sup.p, and p is an integer greater than or equal to 1.
17. The non-transitory computer-readable storage medium according to claim 13, wherein a data structure used in the second filtering manner is a sketch data structure; and counting, in the second filtering manner, the packet that is continuously received comprises: mapping, in the second filtering manner, the packet that is continuously received to w×d counters by using d Hash functions, and determining a minimum value in counting results of the w.sup.d counters as a quantity of currently received packets, wherein w and d are integers greater than 1.
18. The non-transitory computer-readable storage medium according to claim 13, wherein the third threshold is updated based on accuracy of receiving a packet by the communication device and a false positive rate.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
DESCRIPTION OF EMBODIMENTS
[0044] For ease of understanding, example descriptions of some concepts related to embodiments of this application are provided for reference. Details are described as follows.
[0045] A data center is a central storage place for information and data, and may be used by a surrounding networked enterprise or organization to store, manage, and spread information.
[0046] A data center network connects resources in a data center.
[0047] A cache is a buffer area for data exchange.
[0048] A priority is a category used to determine relative importance of a fault, a problem, or a change.
[0049] A queue is a limited cache space for receiving a packet. Packets enter different queues based on priorities and are forwarded in sequence.
[0050] Congestion is extra inter-network or intra-network traffic that reduces network service efficiency.
[0051] A mice flow is a process of transmitting a small amount of data in a short time through a network link.
[0052] An elephant flow is a process of continuously transmitting a large amount of data through a network link.
[0053] An entry specification is a size of entry storage space.
[0054] A core switch is a backbone part of a network, referred to as a core layer. The core layer aims at high-speed forwarding and providing a reliable backbone transmission interface. Therefore, the core switch has higher reliability, performance, and throughput.
[0055] A Hash function is used to map a data set of any length to a fixed length.
[0056] A Hash algorithm is used to calculate any group of input data to obtain a fixed-length output and fills the output in a corresponding location in a Hash table. A same output is certainly obtained in case of a same input. There is a high probability that different outputs are obtained in case of different inputs.
[0057] A false positive rate is a ratio of marked data packets to data packets that should not be marked in total data packets.
[0058] A false negative rate is a ratio of unmarked data packets to data packets that should be marked in total data packets.
[0059] A Hash collision means that different input data is processed by a same Hash algorithm and mapped to a same Hash value.
[0060] Differentiated services code point (DSCP) uses, according to the quality of service (QoS) classification standard of a differentiated service (Diff-Serv), in a type of service (TOS) byte of an Internet Protocol (IP) header of each data packet, used 6 bits and unused 2 bits are used to distinguish a priority by using a code value. The DSCP is an identifier of the used 6 bits in the TOS byte, and is a combination of “IP precedence” and “type of service” fields. To use an old router that supports only “IP precedence”, a DSCP value is used. This is because the DSCP value is compatible with the “IP precedence” field. Each DSCP code value is mapped to a defined PHB (Per-Hop Behavior) identification code. A terminal device may identify traffic by entering the DSCP value.
[0061] The following describes the technical solutions in embodiments of this application with reference to the accompanying drawings in embodiments of this application. In description in embodiments of this application, “I” means “or” unless otherwise specified. For example, A/B may represent A or B. In this specification, “and/or” describes only an association relationship for describing associated objects and represents that at least three relationships may exist. For example, A and/or B may represent the following three cases: Only A exists, both A and B exist, and only B exists. In addition, in description in embodiments of this application, “a plurality of” means two or more.
[0062] The terms “first” and “second” mentioned below are merely intended for a purpose of description, and shall not be understood as an indication or implication of relative importance or implicit indication of a quantity of indicated technical features. Therefore, a feature limited by “first” or “second” may explicitly or implicitly include one or more such features. In description of the embodiments, unless otherwise stated, “a plurality of” means two or more than two.
[0063] Currently, to save entry space in a device, for example, a switch, and improve search efficiency, there are a plurality of elephant flow identification technologies based on a Hash algorithm. Specifically, in these Hash algorithms, a Hash function is used to map a received packet to a location of a vector, a count of a corresponding vector location is checked, and the count is used as a basis for determining an elephant flow. Based on different data structures used in the Hash algorithm, the elephant flow identification methods based on a Hash algorithm may be classified into a Bloom filter method and a sketch method.
[0064] An idea of identifying the elephant flow by using a Bloom filter is simple. In the algorithm, a Hash function and a vector may be used, and initial counts of vector locations are all 0. The Hash function maps packets of a flow to a vector location one by one, and a count increases by 1 during each mapping. Packets of a same flow is mapped to a same vector location. Therefore, when an accumulated count exceeds a selected threshold, the flow may be marked as an elephant flow. Therefore, a method for identifying the elephant flow through counting by using the Bloom filter is simple and easy to operate.
[0065] In an upgraded version of the Bloom filter, the packet of the flow may be mapped to a plurality of vectors by using a plurality of Hash functions and by using a multi-stage Bloom filter. The flow is identified as the elephant flow only when values of counters of corresponding vector locations at all stages are greater than a threshold.
[0066]
[0067] However, in some scenarios, a Hash collision easily occurs in a method for identifying an elephant flow through counting by using the Bloom filter. To be specific, two flows are mapped to a same vector location. In this way, when a mice flow is also mistakenly marked as the elephant flow, a packet of the mice flow is also mistakenly discarded. This may be understood as false positive misjudgment. Even if a quantity of stages is increased, the false positive misjudgment still exists. To be specific, a flow less than the threshold is misjudged as an elephant flow.
[0068] For another sketch method, in Count-min(CM) sketch, as shown in
[0069] Similar to the Bloom filter, an elephant flow identification technology based on the CM sketch still has the Hash collision, that is, false positive misjudgment exists. In addition, the CM sketch counting method is high in space complexity, and is complex during implementation.
[0070] Therefore, this application provides two simple and easy-to-operate elephant flow identification methods, which are applicable to an elephant and mice flow identification scenario, and in particular, can effectively resolve an elephant flow identification problem in a scenario in which entry specification resources of a network device, for example, a switch, are limited, to improve usage of storage space of the network device, for example, the switch.
[0071] In a first method, this application provides an adaptive elephant flow identification method based on two-stage filtering. For example, when an initial packet of a flow is received, first-stage filtering may use a Bloom filter, to mark a packet that exceeds a threshold and is a multiple of a value. In this case, it may be considered that the flow is a candidate elephant flow. When a count of a packet exceeds an upper limit, the packet enters a second-stage filtering CM sketch. When a count of the packets at a second stage exceeds a threshold. In this case, it is considered that the flow is an elephant flow. A packet that is continuously received and that is of the flow is marked.
[0072] In a second method, this application provides a bitmap elephant flow identification method. In this method, Hash calculation may be performed on a quintuple of a packet of a flow. A Hash value corresponds to a bit, and a probability value is set to 0 or 1 through probability calculation. If bitmap values are all 1, the flow may be marked as an elephant flow. As shown in
[0073] The two methods may be applied to a plurality of scenarios. For example, in a scenario 1, when a cache at a link egress in a network is fully occupied, in the conventional technology, regardless of whether the flow is an elephant flow or a mice flow, newly arriving packets of a flow are discarded together. However, by using the elephant flow identification method in this application, a packet belonging to the elephant flow may be identified, and a packet (color) that is discarded with a high probability may be marked. In this way, the marked packet in the elephant flow may be discarded, and a case in which a packet of the mice flow is discarded does not occur.
[0074] As shown in
[0075] In a scenario 2, to reduce a packet loss rate of a network device, rate limiting needs to be performed on a flow of the network device. In this application, an elephant flow identification technology is used, so that a rate limiting policy can be applied only to the elephant flow, to ensure that the packet of the mice flow is not limited in rate and can be normally forwarded.
[0076] As shown in
[0077] In addition to the two scenarios, the method for identifying the elephant flow in this application may be further applied to another scenario.
[0078] The method for identifying the elephant flow in this application may be applied to the network device, for example, the switch. The switch is used as an example. As shown in
[0079] The following first describes the first method in this application.
Embodiment 1
[0080] An embodiment of this application provides a method for identifying a flow. As shown in
[0081] 701: A communication device counts, starting from an initial packet of a target flow, a received packet in a first filtering manner.
[0082] In this application, a communication device switch is used as an example for description.
[0083] It is assumed that the target flow is denoted as a flow f. When receiving the initial packet of the flow f, that is, the 1.sup.st packet, the switch may count, starting from the 1.sup.st packet, the packet in the first filtering manner. Counting herein may be understood as first-layer filtering.
[0084] In some embodiments, a data structure used in the first filtering manner is a Bloom filter, namely, the Bloom filter mentioned above.
[0085] Counting a received packet in a first filtering manner may include:
[0086] a Hash function and a vector that correspond to the Bloom filter are used to map the received packet of the target flow to a vector location corresponding to the target flow in a one-to-one manner, where a count of the corresponding vector location during each mapping increases by 1.
[0087] Herein, a single-stage Bloom filter may be used for counting, or a multi-stage Bloom filter shown in
[0088] It is assumed that when the single-stage Bloom filter is used for counting, if there is a counting vector A including w.sub.1 counters, a Hash function h may evenly map each packet of the flow f to [0, w.sub.1−1] counters based on a flow ID, and a count increases by 1 during each mapping. Then, a count obtained through each mapping is compared with the first threshold λ.sub.1.
[0089] As shown in
[0090] 702: When determining that a quantity of the received packets of the target flow is greater than or equal to a first threshold, the communication device marks, starting from a packet that exceeds the first threshold, a packet whose count is a multiple of m, namely, (λ2), where m is an integer greater than or equal to 2.
[0091] In some embodiments, when determining that the quantity of the received packets of the target flow is greater than or equal to the first threshold λ.sub.1, the communication device determines that the target flow is a candidate elephant flow.
[0092] It is assumed that m is a power of 2, namely, 2.sup.p, the switch may mark, starting from the packet that exceeds the first threshold, the packet whose count is a multiple of 2.sup.p, where m=2.sup.p, and p is an integer greater than or equal to 1.
[0093] For example, it is assumed that λ.sub.1=100, when receiving the 100.sup.th packet of the flow f, the switch determines that the flow f is the candidate elephant flow. If the switch continues to receive the packet of the flow f, that is, [h(f)]≥λ.sub.1, the switch may mark a packet whose current count is a multiple of 2.sup.p in the received packets. For example, when p=3, packets whose counts are 104, 112, 120, 128, and so on may be marked, or it may be understood that the switch marks the received 104.sup.th packet, 112.sup.th packet, 120.sup.th packet, 128.sup.th packet, and so on. If the switch receives the packets based on sequence numbers of the packets, it may also be understood that the switch marks the packets whose sequence numbers are 104, 112, 120, 128, and so on.
[0094] As shown in
[0095] 703: When determining that a quantity of the received packets of the target flow is greater than or equal to a second threshold, the communication device counts, in a second filtering manner, a packet that is continuously received.
[0096] In some embodiments, if a quantity of the packets received by the switch is greater than or equal to the second threshold λ.sub.3, second-layer filtering is performed, and the second filtering manner is used in second-layer filtering.
[0097] For example, a data structure used in the second filtering manner is a sketch data structure, and may be understood as the CM sketch mentioned above.
[0098] Therefore, Counting, in a second filtering manner, a packet that is continuously received may include:
[0099] the second filtering manner is used to map the packet that is continuously received to w×d counters by using d Hash functions, and a minimum value in counting results of the wd counters is determined as a quantity of currently received packets, where w and d are integers greater than 1.
[0100] For example, it is assumed that the second threshold λ.sub.3 is 500, the CM sketch herein is denoted as CM sketch C, and includes d counting vectors. Each counting vector C.sub.i (C.sub.1, C.sub.2, . . . , or C.sub.d) includes w.sub.2 counters. A packet, starting from the 501.sup.st packet, is evenly mapped to [0, w.sub.2−1] based on the flow ID of the flow f by using a Hash function group g.sub.i. D=1, 2, . . . , and w.sub.2≤w.sub.1. As shown in
[0101] 704: When determining that a quantity of the received packets of the target flow is greater than or equal to a third threshold, the communication device determines that the target flow is an elephant flow, and marks a packet that is continuously received. Then, Step 705 or Step 706 is performed.
[0102] If a count of the packet that is continuously received by the switch starting from the 501.sup.st packet is greater than or equal to the third threshold λ.sub.4, it may be determined that the flow f is an elephant flow.
[0103] For example, it is assumed that when the switch starts to perform the second-layer filtering, a minimum value in calculation results of each packet, starting from the 501.sup.st packet, relative to d vectors C may be compared with the third threshold λ.sub.4. If a minimum value obtained through calculation for is min(C.sub.1[g.sub.1(f)], C.sub.2[g.sub.2(f), . . . , C.sub.d[g.sub.d(f)]], when min(C.sub.1 [g.sub.1(f)], C.sub.2[g.sub.2(f), . . . , C.sub.d[g.sub.d(f)])≥λ.sub.4, a packet subsequently continuously received after the packet may be marked, as shown in
[0104] For example, it is assumed that starting from the 501.sup.st packet, when the 501.sup.st packet is considered as the 1.sup.st received packet, the packet that is continuously received is counted by using the CM sketch. In calculation results of a packet, if min(C.sub.1[g.sub.1(f)], C.sub.2[g.sub.2(f), . . . , C.sub.d[g.sub.d(f)]]≥λ.sub.4, and λ.sub.4=50, a packet that is continuously received may be marked starting from the 50.sup.th packet. Actually, it may also be understood that the packet that is received may be continuously marked starting from the 550.sup.th packet.
[0105] 705: When determining that network congestion occurs, the communication device updates a forwarding rate of a marked packet, where an updated forwarding rate is less than an un-updated forwarding rate.
[0106] In some embodiments, when the switch determines that the network congestion occurs, for example, storage space of a cache of the switch has no remaining space, the switch may limit a rate of a marked packet in the determined elephant flow. Rate limiting herein may be understood as reducing a rate at which the marked packet in the elephant flow is enqueued. A principle is similar to that in
[0107] 706: When determining that network congestion occurs, the communication device discards the marked packet in the target flow.
[0108] In some embodiments, when the switch determines that the network congestion occurs, for example, storage space of a cache of the switch has no remaining space, the switch may discard an elephant flow packet, namely, the marked packet in the flow f, to relieve pressure of the cache in the switch and improve a success rate of transmission of a mice flow.
[0109] In this way, in this application, a packet of the elephant flow can be identified by using a two-layer filtering method, and some packets in the elephant flow can be marked. In this way, rate limiting or discarding may be performed on the marked packet, and an unmarked packet may be forwarded normally. When a packet enters second-stage filtering and reaches a third threshold set in second-stage filtering, the packet is continuously marked. A larger flow indicates a larger total quantity of marked packets in a total quantity of packets in the flow. In this way, the elephant flow packet is sampled, to improve usage of entry storage space of the switch, and hierarchical processing can reduce a Hash collision. Further, accuracy of identifying the elephant flow packet can be improved, and a success rate of forwarding the mice flow is improved.
[0110] To describe an effect brought by the two-layer filtering method provided in this application, a marking ratio test model is provided: A fixed total quantity of flows is 400, and a percentage of a total byte count of a marked packet to a total byte count of a total quantity of packets is calculated by changing a byte count (for example, 80 KB, 200 KB, or 10 MB) for sending each of flows with different byte counts.
[0111] It is assumed that the foregoing parameters are set as follows: λ.sub.1=100, λ.sub.2=8, λ.sub.3=256, and λ.sub.4=50.
[0112] An aging rate of a counter is that a count decreases by 1 every 1 ms.
[0113] A DSCP of the unmarked packet is CS1, and a DSCP of the marked packet is AF13.
[0114] Table 1 shows test results.
TABLE-US-00001 TABLE 1 Byte count for Byte count Byte count Percentage sending a for the for the of the single flow CS1 packet AF13 packet AF13 packet 80 KB 15866256 0 0.0% 200 KB 28249596 885868 3.0% 10 MB 41836422 40623166 97.1%
[0115] It can be found from Table 1 that: (1) the mice flow is not marked, for example, there is no marked packet in the mice flow whose byte count is 80 KB; and (2) for a flow that exceeds 100 KB, a larger byte count of a sent flow indicates a higher ratio of marked packets. For a larger flow, for example, a 10 MB flow, a probability of marked packets is close to 100%.
[0116] Therefore, according to the method for identifying a flow provided in this application, an elephant flow identification method based on two-layer filtering may bypass the mice flow, and mark only the packet in the elephant flow. Actually, the rate limiting or discarding is performed only on the marked packet. If a total byte count of a flow is less than a threshold set in the first-stage filtering, a maximum of ⅛ of the packets are marked, and most packets are normally forwarded at an original rate. When a packet enters second-stage filtering and reaches a threshold λ.sub.4 set in second-stage filtering, the packet is continuously marked. A larger flow indicates a larger total quantity of marked packets in of a total byte count of packets.
[0117] In addition, this application further provides an adaptive parameter adjustment method, to adaptively adjust a parameter based on accuracy and a false positive rate of a packet. The parameter herein may be, for example, the foregoing λ.sub.1, λ.sub.2, λ.sub.3, and λ.sub.4.
[0118] This application provides two adaptive parameter adjustment solutions herein. In a possible manner, the parameters λ.sub.1, λ.sub.2, and λ.sub.3 may be fixed, and the parameter 4 may be adjusted. In this case, this embodiment of this application may further include: 707 (not shown in
[0119] Calculation of the accuracy may be determined based on the false positive rate f.sub.wrong and a false negative rate f.sub.loss of the packet.
[0120] For example, the switch may perform packet capture and counting on a flow, and record a total quantity of packets of the elephant flow as N.sub.e, and a total quantity of packets of the mice flow as N.sub.m. A quantity of the marked packets in the elephant flow is N.sub.e′, and a quantity of marked packets in the mice flow is N.sub.m. The false positive rate f.sub.wrong and a false negative rate f.sub.loss may be obtained based on a definition of the false positive rate, namely, a ratio of marked data packets to data packets that should not be marked in total data packets, and a definition of the false negative rate, namely, a ratio of unmarked data packets to data packets that should be marked in the total data packets:
f.sub.wrong=N.sub.m′/N.sub.m, and f.sub.loss=(N.sub.e−N.sub.e′)/N.sub.e.
[0121] Then, based on a calculation manner of the accuracy f.sub.right, namely, 1−(false positive rate+false negative rate)/2, the following may be obtained:
f.sub.right=1(f.sub.wrong+f.sub.loss)/2.
[0122] A false negative may be understood as that an elephant flow packet that should be marked is not marked. To be specific, the false negative rate=a quantity of unmarked elephant flow packets/a total quantity of elephant flow packets.
[0123] A false positive may be understood as that a mice flow packet in the mice flow is mistakenly marked as the elephant flow packet. To be specific, a false positive rate=a quantity of the mice flow packets that are mistakenly marked as the elephant flow packet/a total quantity of mice flow packets.
[0124] Based on this, it may be understood that in this application, λ.sub.1 is a threshold used to mark a packet, and in case of being greater than λ.sub.1, a flow may be classified as an elephant flow candidate.)
[0125] λ.sub.2 may be understood as a sampling rate of packets and is set to a power of 2. A larger value of λ.sub.2 indicates a lower false positive rate and a higher false negative rate.
[0126] When a quantity of packets exceeds λ.sub.3, the packet needs to enter second-layer filtering. A larger value of λ.sub.3 indicates a lower false positive rate and a higher false negative rate.
[0127] λ.sub.4 may be understood as a threshold used to mark a packet. A larger value of λ.sub.4 indicates a lower false positive rate and a higher false negative rate.
[0128] Therefore, this application provides two solutions for adjusting the third threshold λ.sub.4.
[0129] Solution 1: Initialize parameters: λ.sub.1=100, λ.sub.2=8, λ.sub.3=256, and λ.sub.4=n.
[0130] (1) It is assumed that a packet of a flow f of a service is known. f.sub.right is calculated after sampling and packet capturing are performed on the packet of the flow f.
[0131] (2) A threshold ε.sub.f∈(½−½λ.sub.2, ½+½λ.sub.2) is given.
[0132] If f.sub.right≥ε.sub.f, or λ.sub.4=n/2, λ.sub.4 remains unchanged, the adjustment ends, and λ.sub.4 is returned.
[0133] If f.sub.right<ε.sub.f, a value of λ.sub.4 may be updated to: λ.sub.4=m(n/2,(1−δ.sub.f)×λ.sub.4), where δ.sub.f∈(0,0.5).
[0134] Then, (1) continues to be performed. In other words, if f.sub.right<ε.sub.f, packet sampling, identification, and packet capture processing may be performed again by using a new parameter, to calculate a new f.sub.right, so as to determine λ.sub.4 that is updated again. When f.sub.right≥ε.sub.f, or λ.sub.4=n/2, the adjustment ends.
[0135] Solution 2: Initialize parameters: λ.sub.1=100, λ.sub.2=8, λ.sub.3=256, and λ.sub.4=n.
[0136] (1) It is assumed that a packet of a flow f of a service is known. f.sub.right is calculated after sampling and packet capturing are performed on the packet of the flow f.
[0137] (2) A threshold ε.sub.f∈(0,½λ.sub.2) is given.
[0138] If f.sub.right≤ε.sub.f, or λ.sub.4=n/2, λ.sub.4 remains unchanged, the adjustment ends, and λ.sub.4 is returned.
[0139] If f.sub.right>ε.sub.f, a value of λ.sub.4 may be updated to: λ.sub.4=m(n/2, (1−δ.sub.f)×λ.sub.4), where δ.sub.f∈(0,0.5).
[0140] Then, (1) continues to be performed. In other words, if f.sub.right<ε.sub.f, packet sampling, identification, and packet capture processing may be performed again by using a new parameter, to calculate a new f.sub.right, so as to determine λ.sub.4 that is updated again. When f.sub.right≥ε.sub.f, or λ.sub.4=n/2, the adjustment ends.
[0141] In some embodiments, in this application, alternatively, refer to Solution 1 or Solution 2, different training samples are input to the switch to train the parameter λ.sub.4, and an average value of all training results is used as a final λ.sub.4 value.
[0142] The following further describes the second method provided in this application, namely, the bitmap elephant flow identification method.
Embodiment 2
[0143] It should be understood that, in Embodiment 1 of this application, the elephant flow and the mice flow are identified in a manner of counting the packet of the target flow. The counter or the register in the switch generally performs counting by using a 16-bit or 32-bit data structure. This still occupies a specific amount of storage space of the register. It may be understood that, in this application, for a target flow, only whether the target flow is the elephant flow or the mice flow needs to be queried, and a specific flow size is not critical. Therefore, theoretically, whether the target flow is the elephant flow or the mice flow may be identified by using 1 bit. For example, 0 represents the mice flow, and 1 represents the elephant flow. Because the quantity of the packets of the target flow cannot be counted, a manner in which the 1 bit changes from 0 to 1 may be probability marking. To be specific, when each packet of the flow arrives at the communication device, whether the packet needs to be marked as the elephant flow packet may be determined according to a specific probability p. Statistically, if the flow is large enough and the quantity of the packets arriving at the communication device is large enough, the flow is certainly identified as the elephant flow.
[0144] Based on this theory, an embodiment of this application provides a method for identifying a flow. As shown in
[0145] 901: A communication device performs, starting from an initial packet of a target flow, N Hash calculations based on a 5-tuple in each received packet, to determine N bits in a bit sequence that are of results of the N Hash calculations performed on each packet, where the bit sequence is obtained through integration based on storage space of the communication device.
[0146] The bit sequence herein is obtained through integration based on the storage space of the communication device. It may be understood that the communication device integrates all storage space of the communication device into a continuous bit sequence.
[0147] For generating the continuous bit sequence, this application provides a generation method, but is not limited to the method.
[0148] In the method, it is assumed that the storage space includes X data structures, each data structure is Y bit, and a target is to generate a bit sequence of an X×Y size and may be used for operation query. For example, there is an entry space of a 4 K specification in the communication device, and an entry in the entry space occupies 64 bits. In this case, X=4,000 and Y=64 herein, to obtain a 4,000×64 continuous bit sequence.
[0149] Based on this, that a communication device performs N Hash calculations based on a 5-tuple in each received packet may be understood as follows: N Hash functions are preset in the communication device, for example, N is 3, 4, or 5, an input (key) of each Hash function is the 5-tuple in the packet, and a calculation result (value) obtained after the Hash calculation may be understood as a location in the bit sequence. In this way, N bits are obtained.
[0150] When receiving each packet in the target flow, the communication device may perform the N Hash calculations on the packet based on the N Hash functions, to obtain N Hash calculation results. Then, the communication device may determine N different bits in the bit sequence based on the N Hash calculation results.
[0151] In some embodiments, when that the N bits in the bit sequence that are of the results of the N Hash calculations performed on each packet is determined, if a value of the Hash calculation result obtained for each packet may be divided into two parts, a part of the value is mapped to 1 to X, to determine a data structure, and the other part of the value is mapped to 1 to Y, to determine a bit in the data structure.
[0152] The 5-tuple includes a source IP address, a destination IP address, a source port number, a destination port number, and a protocol number that are carried in the packet.
[0153] Corresponding to Step 901, refer to
[0154] 902: The communication device determines, based on probability calculation manners respectively corresponding to the N bits that are obtained by performing Hash mapping on each packet, bit values respectively corresponding to the N bits.
[0155] When the N bits that are of each packet and that correspond to the bit sequence are determined, it may be determined, based on an independent probability P of each bit, that the bit value of the bit is 0 or 1. If a value of any one of the N bits is already 1, the communication device may not perform any operation on the value of the bit. If a value of any one of the N bit is 0, the communication device may update the value of the bit from 0 to 1.
[0156] For example, if it is determined, in a probability calculation manner, that a value of the y.sup.th bit in the data structure needs to be updated from 0 to 1 (1≤y≤Y), the data structure may be marked as 2 raised to the power of y. For example, for the foregoing 64-bit entry, if a value of the 15.sup.th bit needs to be updated from 0 to 1, the entry needs to increases by 2.sup.15=32768 to meet a requirement.
[0157] Based on a process in
[0158] Then, whether the value obtained after probability calculation is 1 continues to be determined, that is, C.sub.i[g.sub.i (f)]=1? continues to be performed, if the value obtained after probability calculation is still 0, the packet is not marked. If the value obtained after probability calculation is still 1, calculation continues to be performed according to a next Hash function, that is, i=i+1 is performed.
[0159] In this case, i<m+1? is determined and indicates whether the N.sup.th Hash function has been executed, and m represents N. If the N.sup.th Hash function has been executed, if i<m+1, calculation of a next Hash function continues to be performed. If i≥m+1, and values of N bits are all 1, the packet is marked as an elephant flow packet.
[0160] 903: The communication device determines that bit values of N bits corresponding to a first packet of the target flow are all 1, and marks the first packet as the elephant flow packet. When determining that bit values of N bits corresponding to a second packet of the target flow are not all 1, the communication device does not mark the second packet.
[0161] For example, when N is 3, if values of three bits that are obtained after three Hash calculations are performed according to a 5-tuple of the first packet are all 1, the first packet is the elephant flow packet. Alternatively, if values of three bits that are obtained after three Hash calculations are performed according to a 5-tuple of the second packet are not all 1, the second packet is not marked.
[0162] 904: When determining that network congestion occurs, the communication device updates a forwarding rate of the first packet, where an updated forwarding rate is less than an un-updated forwarding rate. Alternatively, when determining that network congestion occurs, the communication device discards the first packet.
[0163] An implementation of this step is similar to that of Step 706.
[0164] Therefore, in this application, when the packet of the flow is marked in a bit sequence manner, the continuous bit sequence is constructed, an operation is performed on the bit sequence in an existing data structure, and elephant flow and mice flow identification is performed on the packet in a probability marking and multiple Hash manner. Compared with that in use of a Hash table, the entry space required for storage is more efficiently used, an upper limit of a quantity of flows that can be processed increases, and the efficiency of identifying the elephant flow and the mice flow is better.
[0165] To further describe beneficial effects brought by Embodiment 2, this application provides a marking ratio test model for description.
[0166] In this model, it is assumed that there are 400 elephant flows and 8,000 mice flows. A bit sequence in the communication device is 2 K×16 s bits=32,000 bits.
[0167] It is assumed that a probability P used to mark a packet is 0.005, and a quantity N of Hash times is 5. The communication device may count all packets in a received flow, and test a ratio of a packet that is marked as the elephant flow packet to the packets. Table 2 shows test results.
TABLE-US-00002 TABLE 2 Quantity of Quantity of packets Elephant received marked as an flow marking Flow class packets elephant flow ratio Elephant flow 26211 21628 82.51% 1 KB to 100 KB 13741 40 0.29% random mice flow
[0168] It can be learned from Table 2 that, in the elephant flow, a marking ratio of the elephant flow packets is greater than 80%, and in the mice flow, a marking ratio of the elephant flow packets is less than 0.3%.
[0169] Therefore, in this application, the elephant and mice flows may be distinguished in a bit sequence manner in case of a limited resource. In the identification of the 8,400 flows, space of only 32,000 bits is used. On average, space less than 4 bits is used for each flow, and the identification effect is good.
[0170] It may be understood that, to implement the foregoing functions, the communication device includes corresponding hardware and/or software modules for performing the functions. Algorithm steps in the examples described with reference to embodiments disclosed in this specification can be implemented by hardware or a combination of hardware and computer software in this application. Whether a function is performed by hardware or hardware driven by computer software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application with reference to embodiments, but it should not be considered that the implementation goes beyond the scope of this application.
[0171] In this embodiment, the communication device may be divided into function modules based on the foregoing method examples. For example, each function module may be obtained through division based on each corresponding function, or two or more functions may be integrated into one processing module. The foregoing integrated module may be implemented in a form of hardware. It should be noted that, in this embodiment, division into the modules is an example, and is merely a logical function division. In actual implementation, another division manner may be used.
[0172] When each function module is obtained through division based on each corresponding function,
[0173] The counting unit 1101 may be configured to support the communication device 110 in performing Step 701, Step 703, and the like, and/or another process of the technology described in this specification.
[0174] The marking unit 1102 may be configured to support the communication device 110 in performing Step 702, Step 704, and the like, and/or another process of the technology described in this specification.
[0175] The congestion processing unit 1103 may be configured to support the communication device 110 in performing Steps 705, 706, and the like, and/or another process of the technology described in this specification.
[0176] The updating unit 1104 may be configured to support the communication device 110 in performing Step 707 and the like, and/or another process of the technology described in this specification.
[0177] It should be noted that all related content of the steps in the foregoing method embodiments may be cited in function description of corresponding function modules.
[0178] The communication device 110 provided in this embodiment is configured to perform the foregoing method for identifying a flow, and therefore can achieve a same effect as the foregoing implementation method.
[0179] When an integrated unit is used, the communication device 110 may include a processing module, a storage module, and a communication module. The processing module may be configured to control and manage an action of the communication device 110, for example, may be configured to support the communication device 110 in performing the steps performed by the counting unit 1101, the marking unit 1102, the congestion processing unit 1103, and the updating unit 1104. The storage module may be configured to support the communication device 110 in storing program code, data, and the like. The communication module may be configured to support the communication device 110 in communicating with another device, for example, communicating with a terminal device, and receiving a packet from the terminal device.
[0180] The processing module may be a processor or a controller. The processor may implement or execute various example logical blocks, modules, and circuits described with reference to content disclosed in this application. The processor may alternatively be a combination of processors implementing a computing function, for example, a combination of one or more microprocessors or a combination of a digital signal processor (DSP) and a microprocessor. The storage module may be a memory. The communication module may be specifically a device that interacts with another electronic device, such as a radio frequency circuit, a Bluetooth chip, or a Wi-Fi chip.
[0181] In an embodiment, when the processing module is a processor, the storage module is a memory, and the communication module is a transceiver, the electronic device in this embodiment may be a switch having a structure shown in
[0182] When each function module is obtained through division based on each corresponding function,
[0183] The bit location determining unit 1301 may be configured to support the communication device 130 in performing Step 901, and/or another process of the technology described in this specification.
[0184] The bit value determining unit 1302 may be configured to support the communication device 130 in performing Step 902 and the like, and/or another process of the technology described in this specification.
[0185] The marking unit 1303 may be configured to support the communication device 130 in performing the Step 903 and the like, and/or another process of the technology described in this specification.
[0186] The congestion processing unit 1304 may be configured to support the communication device 130 in performing Step 904 and the like, and/or another process of the technology described in this specification.
[0187] It should be noted that all related content of the steps in the foregoing method embodiments may be cited in function description of corresponding function modules.
[0188] The communication device 130 provided in this embodiment is configured to perform the foregoing method for identifying a flow, and therefore can achieve a same effect as the foregoing implementation method.
[0189] When an integrated unit is used, the communication device 130 may include a processing module, a storage module, and a communication module. The processing module may be configured to control and manage an action of the communication device 130, for example, may be configured to support the communication device 130 in performing the steps performed by the bit location determining unit 1301, the bit value determining unit 1302, the marking unit 1303, and the congestion processing unit 1304. The storage module may be configured to support the communication device 130 in storing program code, data, and the like. The communication module may be configured to support the communication device 130 in communicating with another device, for example, communicating with a terminal device, and receiving a packet from the terminal device.
[0190] The processing module may be a processor or a controller. The processor may implement or execute various example logical blocks, modules, and circuits described with reference to content disclosed in this application. The processor may alternatively be a combination of processors implementing a computing function, for example, a combination of one or more microprocessors, or a combination of a DSP and a microprocessor. The storage module may be a memory. The communication module may be specifically a device that interacts with another electronic device, such as a radio frequency circuit, a Bluetooth chip, or a Wi-Fi chip.
[0191] In an embodiment, when the processing module is a processor, the storage module is a memory, and the communication module is a transceiver, the communication device 130 in this embodiment may also be a switch having a structure shown in
[0192] An embodiment of this application further provides an electronic device, including one or more processors and one or more memories. The one or more memories are coupled to the one or more processors. The one or more memories are configured to store computer program code. The computer program code includes computer instructions. When the one or more processors execute the computer instructions, the electronic device is enabled to perform the foregoing related method steps, to implement the method for identifying a flow in the foregoing embodiments.
[0193] An embodiment of this application further provides a computer-readable storage medium. The computer-readable storage medium stores computer instructions. When the computer instructions are run on an electronic device, the electronic device is enabled to perform the foregoing related method steps, to implement the method for identifying a flow in the foregoing embodiments.
[0194] An embodiment of this application further provides a computer program product. When the computer program product runs on a computer, the computer is enabled to perform the foregoing related steps, to implement the method for identifying a flow in the foregoing embodiments performed by an electronic device.
[0195] In addition, an embodiment of this application further provides an apparatus. The apparatus may be specifically a chip, a component, or a module. The apparatus may include a processor and a memory that are connected. The memory is configured to store computer-executable instructions. When the apparatus runs, the processor may execute the computer-executable instructions stored in the memory, to enable the chip to perform the method for identifying a flow in the foregoing method embodiments performed by an electronic device.
[0196] The communication device, the computer-readable storage medium, the computer program product, or the chip provided in the embodiments is configured to perform the corresponding method provided above. Therefore, for beneficial effects that can be achieved by the communication device, the computer-readable storage medium, the computer program product, or the chip, refer to the beneficial effects in the corresponding method provided above.
[0197] Based on the description of the foregoing implementations, it may be understood by a person skilled in the art that, for ease and brevity of description, division of the foregoing function modules is merely used as an example for description. In actual application, the foregoing functions may be allocated to different function modules for implementation according to a requirement, that is, an internal structure of the apparatus is divided into different function modules, to implement all or some of the functions described above.
[0198] In the several embodiments provided in this application, it should be understood that the disclosed apparatuses and methods may be implemented in other manners. For example, the described apparatus embodiment is merely an example. For example, division into modules or units is merely logical function division and may be other division during actual implementation. For example, a plurality of units or components may be combined or integrated into another apparatus, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electrical, mechanical, or other forms.
[0199] The units described as separate parts may or may not be physically separate, and parts displayed as units may be one or more physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of embodiments.
[0200] In addition, functional units in embodiments of this application may be integrated into one processing unit, each of the units may exist alone physically, or two or more units may be integrated into one unit. The foregoing integrated unit may be implemented in a form of hardware, or may be implemented in a form of a software functional unit.
[0201] When being implemented in the form of the software functional unit and sold or used as an independent product, the integrated unit may be stored in a readable storage medium. Based on such an understanding, the technical solutions in embodiments of this application essentially, or the part contributing to the conventional technology, or all or some of the technical solutions may be implemented in a form of a software product. The software product is stored in a storage medium, and includes a plurality of instructions for instructing a device (which may be a single-chip microcomputer, a chip, or the like) or a processor to perform all or some of the steps of the method in embodiments of this application. The foregoing storage medium includes any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disc.
[0202] The foregoing content is merely specific implementations of this application, but is not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.