REQUEST SCHEDULING
20230007102 · 2023-01-05
Inventors
Cpc classification
International classification
Abstract
A method for end-to-end request scheduling for a distributed system comprising a plurality of hosts via which one or more requests are transmitted. The method comprises: receiving the one or more requests; assigning global scheduling information to each of the one or more requests; transmitting, for each of the one or more requests, respectively assigned global scheduling information with the request, such that respective global scheduling information is made available to a local scheduling unit corresponding to a host via which each of the plurality of requests is transmitted; and determining, for each of one or more one or more requests received at each of the plurality of hosts, an order in which at least one of: a computation operation, a communication operation, and an input/output operation associated with the request is performed, based on the global scheduling information assigned to the respective request.
Claims
1. A method for end-to-end request scheduling for a distributed system, the distributed system comprising a plurality of hosts via which one or more requests are transmitted, the method comprising: receiving the one or more requests; assigning global scheduling information to each of the one or more requests; transmitting, for each of the one or more requests, respectively assigned global scheduling information with the respective request, such that respective global scheduling information is made available to a local scheduling unit corresponding to a host via which each of the plurality of requests is transmitted; and determining, for each of one or more one or more requests received at each of the plurality of hosts, an order in which at least one of: a computation operation, a communication operation, and an input/output operation associated with the respective request is performed, wherein the determination is based on the global scheduling information assigned to the respective request.
2. The method according to claim 1, wherein the method is implemented at a scheduling system comprising an entry-point and a plurality of local scheduling units each corresponding to one of the plurality of hosts in the distributed system, wherein the steps of receiving the one or more requests and assigning global scheduling information are performed at the entry-point, and the step of determining an order in which at least one of a computation operation, a communication operation, and an input/output operation associated with a request is performed by a local scheduling unit corresponding to the respective host at which the respective request is received.
3. The method according to claim 1, wherein each of the plurality of hosts is configured to host one or more applications, each of the one or more applications comprising one or more application components, the method further comprising: performing, by the one or more application components, at least one of a computation operation, a communication operation, and an input/output operation corresponding to each of the one or more requests received at the respective corresponding hosts in the order determined by the corresponding local scheduling unit.
4. The method according to claim 2, wherein the global scheduling information assigned to each of the one or more requests comprises an arrival time of the respective request at the entry-point.
5. The method according to claim 4, wherein an arrival time of a respective request at the entry-point is expressed in the format of integers in 8-bit, or 16-bit, or 32-bit, or 64-bit.
6. The method according to claim 4, wherein determining the order in which at least one of a computation operation, a communication operation, and an input/output operation associated with the respective request is performed based on a first-come-first-served scheduling policy or earliest-deadline-first scheduling policy, by prioritizing requests associated with the earliest arrival times.
7-9. (canceled)
10. A computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to: receive one or more requests; assign global scheduling information to each of the one or more requests; transmit, for each of the one or more requests, respectively assigned global scheduling information with the respective request, such that respective global scheduling information is made available to a local scheduling unit corresponding to a host via which each of the plurality of requests is transmitted; and determine, for each of one or more one or more requests received at each of the plurality of hosts, an order in which at least one of: a computation operation, a communication operation, and an input/output operation associated with the respective request is performed, wherein the determination is based on the global scheduling information assigned to the respective request.
11. A scheduling system for performing end-to-end request scheduling at a distributed system comprising a plurality of hosts via which one or more requests are transmitted, wherein the scheduling system extends a network communication protocol implemented by the distributed system and comprises: an entry-point configured to receive the one or more requests and to assign global scheduling information to each of the one or more requests; and a plurality of local scheduling units, wherein each of the plurality of local scheduling units corresponds to one of the plurality of hosts in the distributed system, wherein the scheduling system is configured to extend the network communication protocol to perform the following: transmitting, for each of the one or more requests, respectively assigned global scheduling information with the respective request via a protocol extension associated with the network communication protocol, such that the respective global scheduling information is made available to a local scheduling unit corresponding to a host via which the respective request is transmitted, wherein each of the plurality of local scheduling units is configured to determine, for each of one or more one or more requests received at the corresponding host, an order in which at least one of a computation operation, a communication operation, and an input/output operation associated with the respective request is performed, wherein the determination is based on the global scheduling information assigned to the respective request.
12. The scheduling system according to claim 11, further comprising the plurality of hosts, wherein each of the plurality of hosts is configured to host one or more applications, each of the one or more applications comprising one or more application components, wherein, for each of the one or more applications, the one or more application components are configured to perform at least one of a computation operation, a communication operation, and an input/output operation corresponding to each of the one or more requests received at the respective corresponding hosts in the order determined by the corresponding local scheduling unit.
13. The scheduling system according to claim 11, wherein the global scheduling information assigned to each of the one or more requests comprises an arrival time of the respective request at the entry-point.
14. The scheduling system according to claim 12, wherein an arrival time of a respective request at the entry-point is expressed in the format of integers in 8-bit, or 16-bit, or 32-bit, or 64-bit.
15. The scheduling system according to claim 13, wherein each of the plurality of local scheduling units is configured to determine the order in which at least one of a computation operation, a communication operation, an input/output operation associated with the respective request is performed based on a first-come-first served scheduling policy or an earlier-deadline-first scheduling policy, by prioritizing requests associated with the earliest arrival times.
16. The scheduling system according to claim 11, wherein each of the plurality of local scheduling units is configured to update, for each of the requests received at the respective corresponding host, the respective global scheduling information such that it comprises information associated with at least one of: an amount of time spent serving the respective request and an amount of network data transmitted on behalf of the respective request.
17. The scheduling system according to claim 16, wherein each of the plurality of local scheduling units is configured to determine the order in which at least one of a computation operation, a communication operation, and an input/output operation associated with the respective request is performed based on a least-attained-service scheduling policy, by prioritizing requests associated with at least one of: shorter amount of time serving the request and smaller amount of network data transmitted on behalf of the request.
18. The scheduling system according to claim 11, wherein the each of the plurality of local scheduling units is implemented in at least one of a level of an execution stack, wherein the execution stack comprises at least one of: a runtime environment, an operating system, and a hypervisor.
19-21. (canceled)
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] For a better understanding of examples of the present invention, and to show more clearly how the examples may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
DETAILED DESCRIPTION
[0019]
[0020] The scheduling system 100 comprises an entry-point 120 and a plurality of local scheduling units 130a, 130b. The entry-point 120 is configured to receive the one or more requests 140a, 140b, 140c, and to assign global scheduling information 150 to each of the one or more requests 140a, 140b, 140c. The global scheduling information can either be generated when the request arrives to the distributed system 110, or produced by a trusted entity outside the distributed system 110. In some embodiments, the entry-point 120 may be configured to discard, for each of the plurality of requests, information associated with an arrival time of the request. This discarding operation may be performed prior to assigning of the global scheduling information. Thus, the entry-point 120 can provide security in the manner that potentially malicious or suspicious data contained in the information associated with arrivals times can be discarded.
[0021] Each of the plurality of local scheduling units 130a, 130b corresponds to one of the plurality of hosts 112a, 112b in the distributed system 110. In the present embodiment, a first local scheduling unit 130a of the plurality of local scheduling units corresponds to the first host 112a, and a second local scheduling unit 130b of the plurality of local scheduling units corresponds to the second host 112b. In some embodiments, each of the plurality of local scheduling units 130a, 130b implemented in at least one of a level of an execution stack. The execution stack may comprise at least one of the following levels: a runtime environment, an operating system, and a hypervisor. Moreover, in some embodiments each of the plurality of local scheduling units may be implemented in a single level of the execution stack, without requiring any support or modifications in the other levels of the execution stack. For example, in some embodiments each of the plurality of local scheduling unit may be implemented in a single level of the execution stack without requiring further interface or functionality provided by a different level of the execution stack.
[0022] In some embodiments, the global scheduling information 150 assigned to each of the one or more requests 140 may comprise an arrival time of the respective request at the entry-point 120. This is illustrated in
[0023] Although not illustrated in
[0024] The scheduling system 100 is configured to extend the network communication protocol to transmit, for each of the one or more requests 140a, 140b, 140c, respectively assigned global scheduling information 150 with the respect request via protocol extension associated with the network communication protocol, such that the respective global scheduling information 150 is made available to a local scheduling unit 130 corresponding to a host 112 via which the respective request 140 is transmitted.
[0025] As mentioned above, each of the plurality of local scheduling units 130a, 130b corresponds to one of the plurality of hosts 112a, 112b. According to the present embodiment, each of the plurality of local scheduling units 130a, 130b is configured to determine, for each of the one or more requests 140 received at the corresponding host 112, an order in which at least one of a computation operation, a communication operation, and an input/output operation associated with the respective request 140 is performed. The determination performed by the local scheduling unit 130 is based on the global scheduling information 150 assigned to the respective request 140.
[0026] In some embodiments, each of the plurality of local scheduling units 130a, 130b may be configured to determine the order in which at least one of a computation operation, a communication operation, an input/output operation associated with the respective request 140 is performed based on a first-come-first-served scheduling policy or an earliest-deadline-first scheduling policy, by prioritising requests 140 associated with the earliest arrival times. Alternatively or in addition, in some embodiments, each of the plurality of local scheduling units 130a, 130b may be configured to determine the order in which at least one of a computation operation, a communication operation, and an input/output operation associated with the respective request 140 is performed based on a least-attained-service scheduling policy, by prioritising requests 140 associated with at least one of: shorter amount of time serving the request and smaller amount of network data transmitted on behalf of the request.
[0027] Furthermore, in some embodiments, each of the plurality of local scheduling units 130a, 130b may be configured to update, for each of the requests received at the respective corresponding host 112, the respective global scheduling information 150 such that it comprises information associated with at least one of: an amount of time spent serving the respective request and an amount of network data transmitted on behalf of the respective request. For example, as shown in
[0028] In the present embodiment as illustrated in
[0029] Those skilled in the art would appreciate that in alternative embodiments at least part of the distributed system 110 may not be part of the scheduling system 100. For example, in some embodiments, the plurality of hosts 112a, 112b may not be part of the scheduling system 100.
[0030] Although not shown in
[0031] It will be appreciated that
[0032]
[0033] In more detail, in some embodiments computation, communication, and input/output operations issued by the application to the runtime 220 may be encapsulated in lightweight threads, such as events or go-routines. The runtime 220 may act as a local scheduling unit for the application. The runtime 220 may then issues system calls to the (potentially virtualized) operating system 240, said system calls being issues within one or more kernel threads. The operating system 240 kernel may act as a local scheduling unit ordering the kernel threads created by the runtime 220. Finally, the operating system 240 may run the operations on (potentially virtual) CPUs. If the CPUs are virtual, then a hypervisor may act as the local scheduling unit for ordering virtual CPU operations onto the hardware CPUs. Finally, in some embodiments the hardware CPU itself may schedule instructions onto the underlying arithmetic, memory or I/O units. The method according to the present disclosure may be implemented at one or more of these levels, whenever a local scheduling decision is involved. The place to implement ordering may be chosen depending on the information available from the upper execution level, on the level of congestion on the resources of the lower execution level, and/or on the network stack layer at which the global scheduling information is encapsulated. For example, in some embodiments global scheduling information may be transmitted via HTTP headers and perform local scheduling decisions in the runtime 220, whereas in other embodiments global scheduling information may be transmitted via IPv4 options and perform local scheduling decisions in the operating system 240 kernel.
[0034]
[0035] With reference to
[0036] In some embodiments, global scheduling information assigned to each of the one or more requests at step 420 may comprise an arrival time of the respective request at the entry-point. An arrival time of a respective request at the entry-point may be expressed in the format of integers in 8-bit, or 16-bit, or 32-bit, or 64-bit.
[0037] Subsequent to global scheduling information being assigned, at step 430 for each of the one or more requests, respective assigned global scheduling information 150 is transmitted with the respective request, such that respective global scheduling information 150 is made available to a local scheduling unit 130 corresponding to a host 112 via which each of the plurality of requests 140 is transmitted. For example, for the first request 140a in the plurality of requests, the global scheduling information of the first request 140a can be made available to each of the one or more hosts 112 via which the first request 140a is transmitted. In embodiments where the scheduling system 110 extends a network communication protocol implemented by the distributed system 110, the transmission of the one or more requests may be performed via the network communication protocol implemented by the distributed system 110, and the transmission of the respectively assigned global scheduling information may be performed via a protocol extension associated with the network communication protocol.
[0038] Then, at step 440, for each of the one or more requests received at each of the plurality of hosts 112, an order in which at least one of: a computation operation, a communication operation, and an input/output operation associated with the respective request 140 is performed. This determination at 450 is based on the global scheduling information 150 assigned to the respective request. In some embodiments, the determination may be performed by a respective local scheduling unit 130 corresponding to the host 112.
[0039] In some embodiments, determining the order at step 440 may be based on a first-come-first-served scheduling policy or earliest-deadline-first scheduling policy. In more detail, requests associated with earlier arrival times are prioritised, i.e. the earlier the arrival time, the more the respective request is prioritised in the determined order. For example, referring to the exemplary arrival times shown in
[0040] In some embodiments, determining the order at step 440 may be based on a least-attained-service scheduling policy. In more detail, requests associated with at least one of: shorter amount of time serving the request and smaller amount of network data transmitted on behalf of the request are prioritised, i.e. the shorter the amount of time spent serving the request, and/or the smaller the amount of network data transmitted on behalf of the request, the more the respective request is prioritised in the determined order. For example, referring to the exemplary service times shown in
[0041] The method may include an optional method step 450 in which at least one of a computation operation, a communication operation, and an input/output communication is performed by one or more application components, in the order determined by the corresponding local scheduling unit 130 at step 440. The one or more application components may be part of one or more applications that are hosted by the plurality of hosts 112.
[0042] Although not illustrated in
[0043] Moreover, although not illustrated in
[0044] Those who are skilled in the art would appreciate that in some embodiments the method steps illustrated in steps 430 to 450 may be performed in a different order. For example, in some embodiments after determining an order in which at least one of: a computation operation, a communication operation, and an input/output operation associated with a respective request is performed (step 440), and performing such operation(s) in the determined order (step 450), the method may return to step 430 at which the respective request is transmitted to another host with the respectively assigned global scheduling information.
[0045] For at least some of the embodiments of the disclosure, by tracking the global scheduling information (e.g. arrival time) of each user request throughout the distributed system and enforcing an end-to-end servicing order of requests, performance loss due to modularisation in multi-tiered or micro-service-based applications can be reduced.
[0046]
[0047] The graphs as shown in
[0048] To better understand the contribution of each of two aspects, in some experiments (i.e. the experiments associated with the graphs shown in
[0049] The experiments associated with the graphs shown in
[0050] In order to ensure that the results are reliable and unbiased, vmtouch utility was used to hold the database files in-memory, thus avoiding variance due to disk latency. Furthermore, in order to ensure the load generated in the same way during each experiment, the httpmon workload generator in open system model and the same sequence of exponentially distributed inter-arrival times were used. Also, no non-essential processes or cron scripts were running at the time of the experiments. For example, some Core OS services that might interfere with the experiments were masked.
[0051] Since the experimental setup for all of these experiments included diverse application structures (non-threaded event-driven, non-threaded process pool, thread-pool), the evaluation results are relevant for a wide range of applications.
[0052] In the graphs as shown in
[0053] Referring to the graph shown in
[0054] Referring to the graph shown in
[0055] One of the advantages of implementing TailTamer is that the technique runs in the kernel. This means that, in contrast to application-level scheduler, TailTamer is insensitive to several processes being co-located on the same resources an can potentially better deal with self-interference, i.e. the undesirable phenomenon of components of an application causing performance interference among themselves due to co-location. To illustrate this, the experiment associated with the graph of
[0056] Referring to the graph of
[0057] Referring to the graph of
[0058] It is noted that the experiments associated with the graphs illustrated in
[0059] Given the fact that experiments over a real network imply more variability, 10 repetitions for each arrival rate were performed. The focus of the experiments is on the 99.sup.th percentile response time, and each measurement, the mean of the measurements and the 95% confidence intervals computed using the loess method were reported.
[0060] Referring to
[0061] In summary of the results presented in the graphs of
[0062] Thus, embodiments of the present disclosure provide a method for end-to-end request scheduling which can reduce tail response time and can be readily employed without requiring changes to the application source code, and they are particularly beneficial at high load or when the application components are co-located. This translates into lower infrastructure cost and, given the lack of energy-proportional hardware, higher energy efficiency while ensuring good user experience without resorting to computing capacity over-provisioning. The method reduces tail response time at the boundary of the distributed system instead of reducing tail response time for individual components. Moreover, the method is application agnostic, hence applications can directly benefit from it without source code modifications and without using the same communication framework.
[0063] Embodiments of the disclosure also provide a scheduling system for performing end-to-end request scheduling at a distributed system comprising a plurality of hosts. The scheduling system extends a network communication protocol implemented by the distributed system.
[0064] There is also provided a computer program product comprising a computer readable medium, the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method or methods described herein. Thus, it will be appreciated that the disclosure also applies to computer programs, particularly computer programs on or in a carrier, adapted to put embodiments into practice. The program may be in the form of a source code, an object code, a code intermediate source and an object code such as in a partially compiled form, or in any other form suitable for use in the implementation of the method according to the embodiments described herein.
[0065] It will also be appreciated that such a program may have many different architectural designs. For example, a program code implementing the functionality of the method or system may be sub-divided into one or more sub-routines. Many different ways of distributing the functionality among these sub-routines will be apparent to the skilled person. The sub-routines may be stored together in one executable file to form a self-contained program. Such an executable file may comprise computer-executable instructions, for example, processor instructions and/or interpreter instructions (e.g. Java interpreter instructions). Alternatively, one or more or all of the sub-routines may be stored in at least one external library file and linked with a main program either statically or dynamically, e.g. at run-time. The main program contains at least one call to at least one of the sub-routines. The sub-routines may also comprise function calls to each other.
[0066] An embodiment relating to a computer program product comprises computer-executable instructions corresponding to each processing stage of at least one of the methods set forth herein. These instructions may be sub-divided into sub-routines and/or stored in one or more files that may be linked statically or dynamically. Another embodiment relating to a computer program product comprises computer-executable instructions corresponding to each means of at least one of the systems and/or products set forth herein. These instructions may be sub-divided into sub-routines and/or stored in one or more files that may be linked statically or dynamically.
[0067] The carrier of a computer program may be any entity or device capable of carrying the program. For example, the carrier may include a data storage, such as a ROM, for example, a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example, a hard disk. Furthermore, the carrier may be a transmissible carrier such as an electric or optical signal, which may be conveyed via electric or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such a cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted to perform, or used in the performance of, the relevant method.
[0068] Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.
[0069] The above disclosure sets forth specific details, such as particular embodiments or examples for purposes of explanation and not limitation. It will be appreciated by one skilled in the art that other examples may be employed apart from these specific details.