Data transmission method and device
11140082 · 2021-10-05
Assignee
Inventors
Cpc classification
H04L2101/663
ELECTRICITY
H04L47/196
ELECTRICITY
International classification
Abstract
This application provides a data transmission method. The method includes: calculating a first duration based on at least one to-be-sent data flow and a first time interval, where the first time interval is a preset value, and different data flows in the at least one to-be-sent data flow have different 5-tuples; and sending a first data flow, where the first data flow belongs to the at least one to-be-sent data flow; where a first set of packets of the first data flow are sent in a first time period, a second set of packets of the first data flow are sent in a second time period following a second time interval, a duration of the first time period and a duration of the second time period are equal to the first duration, and the second time interval is greater than or equal to the first time interval.
Claims
1. A data transmission method, comprising: calculating a first duration based on at least one to-be-sent data flow and a first time interval, wherein the first time interval is a preset value; and sending a first data flow included in the at least one to-be-sent data flow; wherein a first set of packets of the first data flow are sent in a first time period, a second set of packets of the first data flow are sent in a second time period following a second time interval, a duration of the first time period and a duration of the second time period are both equal to the first duration, and the second time interval is greater than or equal to the first time interval.
2. The method according to claim 1, wherein the at least one to-be-sent data flow includes multiple data flows, the second time interval is immediately after the first time period, and the method further comprises: sending, in the second time interval, at least one packet of one or more data flows in the at least one to-be-sent data flow other than the first data flow.
3. The method according to claim 2, wherein the at least one packet sent in the second time interval includes multiple packets belonging to different data flows in the at least one to-be-sent data flow.
4. The method according to claim 1, wherein the method further comprises: setting different user datagram protocol (UDP) source port numbers respectively for the first set of packets sent in the first time period and the second set of packets sent in the second time period.
5. The method according to claim 1, wherein the method further comprises: setting a same UDP source port number for packets sent in one time period having a duration equal to the first duration.
6. The method according to claim 1, wherein the at least one to-be-sent data flow are a plurality of to-be-sent data flows and the calculation of the first duration based on the at least one to-be-sent data flow and the first time interval comprises: determining a quantity of the plurality of to-be-sent data flows, wherein the first duration≥the first time interval/(the quantity of the plurality of to-be-sent data flows−1).
7. The method according to claim 6, wherein the determination of the quantity of the plurality of to-be-sent data flows comprises: after a data flow is sent or added, updating the quantity of the plurality of to-be-sent data flows.
8. The method according to claim 1, wherein the method further comprises: calculating, based on the first duration, a quantity of packets to be sent in one time period having a duration equal to the first duration; and wherein the sending a first data flow comprises: continuously sending the first set of packets of the first data flow, wherein a quantity of the first set of packets is the same as the calculated quantity of packets, and continuously sending the second set of packets of the first data flow after the second time interval.
9. The method according to claim 1, wherein the at least one to-be-sent data flow includes a RoCEv2 flow carried over a converged Ethernet.
10. A data transmission device, comprising: at least one processor, configured to calculate a first duration based on at least one to-be-sent data flow and a first time interval, wherein the first time interval is a preset value; and a transmitter, configured to send a first data flow included in the at least one to-be-sent data flow; wherein a first set of packets of the first data flow are sent in a first time period, a second set of packets of the first data flow are sent in a second time period following a second time interval, a duration of the first time period and a duration of the second time period are both equal to the first duration, and the second time interval is greater than or equal to the first time interval.
11. The device according to claim 10, wherein the second time interval is immediately after the first time period, and the at least one to-be-sent data flow includes multiple data flows and the transmitter is further configured to send, in the second time interval, at least one packet of one or more data flows in the at least one to-be-sent data flow other than the first data flow.
12. The device according to claim 11, wherein the at least one to-be-sent data flow includes multiple data flows and the at least one packet sent in the second time interval includes multiple packets belonging to different data flows in the at least one to-be-sent data flow.
13. The device according to claim 10, wherein the at least one processor is further configured to set different user datagram protocol (UDP) source port numbers respectively for the first set of packets sent in the first time period and the second set of packets sent in the second time period.
14. The device according to claim 10, wherein the at least processor is further configured to set a same UDP source port number for packets sent in one time period having a duration equal to the first duration.
15. The device according to claim 10, wherein the at least one to-be-sent data flow are a plurality of to-be-sent data flows and the at least one processor is further configured to: determine a quantity of the plurality of to-be-sent data flows, wherein the first duration≥the first time interval/(the quantity of the plurality of to-be-sent data flows−1).
16. The device according to claim 15, wherein the at least one processor is further configured to, after a data flow is sent or added, update the quantity of the plurality of to-be-sent data flows.
17. The device according to claim 10, wherein the at least one processor is further configured to calculate, based on the first duration, a quantity of packets to be sent in one time period having a duration equal to the first duration; and the transmitter is further configured to continuously send the first set of packets of the first data flow, wherein a quantity of the first set of packets is the same as the calculated quantity of packets, and continuously send the second set of packets of the first data flow after the second time interval.
18. The device according to claim 10, wherein the at least one to-be-sent data flow includes a RoCEv2 flow carried over a converged Ethernet.
19. The method according to claim 1, wherein the at least one to-be-sent data flow includes multiple data flows having different 5-tuples, each of the 5-tuples representing a source internet protocol address (src IP), a destination internet protocol address (dst IP), an Internet protocol address protocol (IP protocol), a source port (src Port), and a destination port (dst Port).
20. The device according to claim 10, wherein the at least one to-be-sent data flow includes multiple data flows having different 5-tuples, each of the 5-tuples representing a source internet protocol address (src IP), a destination internet protocol address (dst IP), an Internet protocol address protocol (IP protocol), a source port (src Port), and a destination port (dst Port).
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DESCRIPTION OF EMBODIMENTS
(14) The following describes the technical solutions in embodiments of this application with reference to the accompanying drawings.
(15) It should be understood that a data transmission method in example embodiments of this application may be applied to a switch (for example, a TOR access switch in a CLOS network), or may be applied to a network adapter (for example, a remote direct memory access network adapter integrated into a server). Specifically, the method in example embodiments of this application may be implemented on a switch or a chip of a network adapter. The following uses a CLOS network as an example to describe a data center network that applies the data transmission method in embodiments of this application. This is not limited in embodiments of this application.
(16)
(17) In an embodiment, data transmission between the servers may be implemented by using the TOR access switches and the spine switches. For example, a server Server #1 sends a packet of a first data flow to Server #3. First, Server #1 sends the packet of the first data flow to TOR access switch #1. Then, TOR access switch #1 may send the packet of the first data flow to TOR access switch #3 by using SPINE switch #1, or TOR access switch #1 may send the packet of the first data flow to TOR access switch #3 by using SPINE switch #2. Finally, TOR access switch #3 sends the packet of the first data flow to Server #3. In this sending process of the first data flow, the data transmission method described in the embodiment of this application may be implemented at TOR access switch #1 and TOR access switch #3, or the data transmission method described in the embodiment of this application may be implemented at Server #1 and Server #3.
(18)
(19) In an embodiment, data transmission between the servers may be implemented by using the TOR switches, the AGG switches, and the spine switches. For example, a server Server #1 sends a packet of a first data flow to Server #2. First, Server #1 sends the packet of the first data flow to TOR access switch #1. Then, TOR access switch #1 may send the packet of the first data flow to SPINE #1 by using AGG switch #1, or TOR access switch #1 may send the packet of the first data flow to SPINE switch #2 by using AGG switch #1, or TOR access switch #1 may send the packet of the first data flow to SPINE switch #1 by using AGG switch #2, or TOR access switch #1 may send the packet of the first data flow to SPINE switch #2 by using AGG switch #2. Subsequently, when the packet of the first data flow is transmitted to SPINE switch #1, SPINE switch #1 may send the packet of the first data flow to TOR access switch #2 by using AGG switch #1, and SPINE switch #1 may send the packet of the first data flow to TOR access switch #2 by using AGG switch #2. When the packet of the first data flow is transmitted to SPINE switch #2, SPINE switch #2 may send the packet of the first data flow to TOR access switch #2 by using AGG switch #1, and SPINE switch #2 may send the packet of the first data flow to TOR access switch #2 by using AGG switch #2. Finally, TOR access switch #2 sends the packet of the first data flow to Server #2. For another example, the server Server #1 sends the packet of the first data flow to Server #2. First, Server #1 sends the packet of the first data flow to TOR access switch #1. Then, TOR access switch #1 may directly send the packet of the first data flow to TOR access switch #2 by using AGG switch #1, or TOR access switch #1 may directly send the packet of the first data flow to TOR access switch #2 by using AGG switch #2. Finally, TOR access switch #2 sends the packet of the first data flow to Server #2. In this sending process of the packet of the first data flow, the data transmission method described in the embodiment of this application may be implemented at TOR access switch #1 and TOR access switch #2, or the data transmission method described in the embodiment of this application may be implemented at Server #1 and Server #2.
(20) It should be understood that the data center network 100 and the data center network 200 shown in
(21) It should be further understood that this embodiment of this application may be further applied to another CLOS network, such as a four-level CLOS network or a higher-level CLOS network. This is not limited in embodiments of this application.
(22)
(23) S310. Calculate a first time interval based on at least one to-be-sent data flow and a first time interval, where the first time interval is a preset value, and different data flows in the at least one to-be-sent data flow have different 5-tuples.
(24) Optionally, the at least one to-be-sent data flow is an RDMA over a converged Ethernet v2 (RoCEv2) flow carried over a converged Ethernet.
(25) Optionally, the RoCEv2 data flow may be an ECMP load balancing flow based on a 5-tuple hash.
(26) Optionally, a 5-tuple means a source internet protocol address (src IP), a destination internet protocol address (dst IP), an Internet protocol address protocol (IP protocol), a source port (src Port), and a destination port (dst Port).
(27) Optionally, as shown in
(28) Optionally, different data flows in the at least one to-be-sent data flow have different 5-tuples.
(29) It should be understood that, if any element in a 5-tuple of one data flow is different from that of another data flow, the two data flows are different. For example, if a source port number in a 5-tuple of one data flow is different from that of another data flow, the two data flows are different.
(30) It should be further understood that a same result may be obtained when 5-tuple hash operations are performed on different data flows.
(31) It should be understood that, when packets of the at least one data flow are transmitted on a plurality of paths, there may be a delay difference between different paths in the plurality of paths.
(32) Optionally, the first time interval is a preset value, and the first time interval is greater than or equal to a maximum path delay difference, which may be represented by Flowlet Gap.
(33) Optionally, the first duration≥the first time interval/(the quantity of to-be-sent data flows−1). It should be understood that, the quantity of to-be-sent data flows is greater than or equal to 2 in this case.
(34) Optionally, when the quantity of to-be-sent data flows is 1, the first duration is greater than or equal to the first time interval.
(35) Optionally, after a data flow is sent and/or a data flow is added, the quantity of to-be-sent data flows may be updated.
(36) S320. Send a first data flow, where the first data flow belongs to the at least one to-be-sent data flow.
(37) A plurality of packets of the first data flow are sent in a first time period, packets of the first data flow are sent in a second time period following a second time interval, a duration of the first time period and a duration of the second time period are equal to the first duration, and the second time interval is greater than or equal to the first time interval.
(38) It should be understood that after the first duration is determined, the RDMA network adapter periodically sends packets of the first data flow. Optionally, the RDMA network adapter may send the packets of the first data flow at the second time interval.
(39) It should be further understood that the first time period and the second time period are only any two adjacent time periods for sending the first data flow. This is not limited in this embodiment of this application.
(40) Optionally, the RDMA network adapter may send a same quantity of packets in each time period, for example, send five packets in a first time period, five packets in a second time period, . . . , and finally 5 packets in a last time period.
(41) Optionally, the RDMA network adapter may send a same quantity of packets in each time period other than the last time period, and may send, in the last time period, packets of a quantity that is less than that of packets sent in another time period, for example, send five packets in the first time period, five packets in the second time period, . . . , and two packets in the last time period (only two to-be-sent packets are left in the last time period).
(42) For example, as shown in
(43) Optionally, no packet of the first data flow is sent in the second time interval. Optionally, in this case, some feedback frames, such as an ACK frame, may be sent in the second time interval, or no packet may be sent.
(44) Optionally, a packet of a data flow in the at least one to-be-sent data flow other than the first data flow is sent in the second time interval.
(45) For example, as shown in
(46) For another example, as shown in
(47) Optionally, packets of different data flows may be sent in the second time interval.
(48) For another example, as shown in
(49) It should be understood that, in the examples in
(50) Optionally, after the first duration is determined, a quantity of packets sent in one first duration may be calculated based on the first duration.
(51) Specifically, a quantity of packets sent in one first duration=the first duration*a port rate/(8*maximum transmission unit). The port rate is in a unit of kbps, and the maximum transmission unit (MTU) is in a unit of byte. For example, an MTU in an Ethernet protocol may be 1500 bytes, and an MTU in a point to point protocol over Ethernet (PPPoE) may be 1492 bytes.
(52) It should be understood that the port rate may be a port rate at which an RDMA network adapter sends packets, or may be a port rate at which a TOR switch sends packets.
(53) Optionally, after a quantity of packets sent in one first duration is obtained through calculation, the RDMA network adapter may continuously send a plurality of packets of the first data flow, wherein a quantity of the plurality of packets is the same as the quantity of packets obtained through calculation, and continuously send packets of the first data flow after the second time interval.
(54) Optionally, the RDMA network adapter sets different UDP source port numbers or TCP source port numbers separately for packets sent in two consecutive first durations.
(55) Setting a UDP source port number is used as an example below for description.
(56) For example, as shown in
(57) Optionally, the RDMA network adapter sets different UDP source port numbers or TCP source port numbers separately for packets that are sent consecutively and that belong to different data flows.
(58) Setting a UDP source port number is used as an example below for description.
(59) For example, as shown in
(60) For another example, as shown in
(61) Optionally, the RDMA network adapter sets a same UDP source port number or TCP source port number for packets sent in one first duration.
(62) It should be understood that a UDP port or a TCP port in this embodiment of this application is a logical port, and a port number range may be 0 to 65535.
(63) It should be further understood that, in the examples in
(64) Optionally, the RDMA network adapter may set a same UDP destination port number or TCP destination port number for the at least one to-be-sent data flow.
(65) Optionally, in the RoCEv2 protocol, a well-known port number may be used to represent a destination port number of the at least one to-be-sent data flow. For example, the UDP destination port number of the at least one to-be-sent data flow may be set to a well-known port number 4791.
(66) Optionally, as shown in
(67) For example, a first data flow needs to be sent from Server #1 to Server #3. An RDMA network adapter integrated into Server #1 may send a plurality of packets of the first data flow in a first time period, send a plurality of packets of the first data flow in a second time period following one second time interval, and send a plurality of packets of the first data flow in a third time period following one second time interval . . . . In this case, packets of the first data flow are sent periodically, to actively construct a sub-flow for the first data flow.
(68) For another example, a first data flow needs to be sent from Server #1 to Server #3. After receiving packets of the first data flow, TOR access switch #1 may send a plurality of packets of the first data flow in a first time period, send a plurality of packets of the first data flow in a second time period following one second time interval, and send a plurality of packets of the first data flow in a third time period following one second time interval, and so on. In this case, packets of the first data flow are sent periodically, to actively construct a sub-flow for the first data flow.
(69) For still another example, a first data flow needs to be sent from Server #1 to Server #3. After receiving packets of the first data flow, TOR #1 may send a plurality of packets of the first data flow in a first time period, send a plurality of packets of the first data flow in a second time period following one second time interval, and send a plurality of packets of the first data flow in a third time period following one second time interval, and so on. In this case, packets of the first data flow are periodically sent. In addition, different UDP source port numbers or TCP source port numbers are set separately for packets sent in each time period, so that packets sent in different time periods may be sent by using different paths. As shown in
(70) Therefore, according to the data transmission method in this embodiment of this application, when at least one data flow is to be sent, a sub-flow is actively constructed for the data flow. This splits a data flow, eliminates continuous congestion on a port of a switched network, brings a good load balancing effect, and features easy implementation.
(71) Further, a different UDP source port number or TCP source port number is set for each sub-flow of one data flow, so that load balancing of each sub-flow can be implemented when a switch supports an ECMP load balancing function that is based on a 5-tuple hash, and further congestion and a packet loss caused by network traffic load imbalance are addressed, and data transmission reliability is increased.
(72)
(73) a processor 510, configured to calculate a first duration based on at least one to-be-sent data flow and a first time interval, where the first time interval is a preset value, and different data flows in the at least one to-be-sent data flow have different 5-tuples; and
(74) a transmitter 520, configured to send a first data flow, where the first data flow belongs to the at least one to-be-sent data flow; where
(75) a plurality of packets of the first data flow are sent in a first time period, packets of the first data flow are sent in a second time period following a second time interval, a duration of the first time period and a duration of the second time period are equal to the first duration, and the second time interval is greater than or equal to the first time interval.
(76) Optionally, the transmitter 520 is further configured to send a packet of a data flow in the at least one to-be-sent data flow other than the first data flow in the second time interval.
(77) Optionally, packets sent in the second time interval belong to different data flow.
(78) Optionally, the processor 510 is further configured to set different UDP source port numbers separately for the packets sent in the first time period and the second time period.
(79) Optionally, the processor 510 is further configured to set a same UDP source port number for packets sent in one first duration.
(80) Optionally, the processor 510 is further configured to determine a quantity of to-be-sent data flows, where
(81) the first duration≥the first time interval/(the quantity of to-be-sent data flows−1).
(82) Optionally, the processor 510 is further configured to, after a data flow is sent and/or one data flow is added, update the quantity of to-be-sent data flows.
(83) Optionally, the processor 510 is further configured to calculate, based on the first duration, a quantity of packets sent in one first duration.
(84) The transmitter 520 is further configured to continuously send a plurality of packets of the first data flow, wherein a plurality of the packets is the same as the quantity of packets obtained through calculation, and continuously send packets of the first data flow after the second time interval.
(85) Optionally, the at least one to-be-sent data flow is a RoCEv2 flow carried over a converged Ethernet.
(86) It should be understood that the foregoing and other operations and/or functions of the units of the data transmission device 500 in this embodiment of this application are separately used to implement corresponding procedures of the RDMA network adapter or the TOR switch in the method 300 in
(87)
(88) a memory 610, configured to store a program, where the program includes code;
(89) a transceiver 620, configured to communicate with another device; and
(90) a processor 630, configured to execute the program code stored in the memory 610.
(91) Optionally, when the code is executed, the processor 630 may implement operations performed by the RDMA network adapter or the TOR switch in the method 300 in
(92) It should be understood that in this embodiment of this application, the processor 630 may be a central processing unit (CPU), or the processor 630 may be another general purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or another programmable logic device, a discrete gate or a transistor logic device, a discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, or the like.
(93) The memory 610 may include a read-only memory and a random access memory, and provides the processor 630 with data and an instruction. A part of the memory 610 may further include a non-volatile random access memory. For example, the memory 610 may further store information of a device type.
(94) The transceiver 620 may be configured to implement a signal sending and receiving function, such as a frequency modulation and demodulation function or an up-conversion and down-conversion function.
(95) In an implementation process, at least one step of the foregoing method may be completed by using an integrated logic circuit of hardware in the processor 630, or the integrated logic circuit may complete the at least one step by using an instruction driver in a software form. Therefore, the data transmission device 600 may be a chip or a chip group. The steps of the method disclosed with reference to the embodiments of this application may be directly performed by a hardware processor, or may be performed by using a combination of hardware in the processor and a software module. The software module may be located in a mature storage medium in the art, such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory or electrically erasable programmable memory, or a register. The storage medium is located in the memory. The processor 630 reads information in the memory and completes the steps of the foregoing methods in combination with hardware of the processor. To avoid repetition, details are not described herein again.
(96) A Person of ordinary skill in the art may be aware that, the units and algorithm steps in the examples described with reference to the embodiments disclosed in this specification may be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraint conditions of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
(97) It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the system, apparatus, and unit, refer to a corresponding process in the method embodiments. Details are not described herein again.
(98) In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely an example. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
(99) The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual needs to achieve the objectives of the solutions of the embodiments of this application.
(100) In addition, functional units in the embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units may be integrated into one unit.
(101) All or some of the foregoing embodiments may be implemented by using software, hardware, firmware, or any combination thereof. When software is used to implement the embodiments, the embodiments may be implemented completely or partially in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the procedure or functions according to the embodiments of this application are all or partially generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or another programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), a semiconductor medium (for example, a solid state disk (SSD)), or the like.
(102) The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.