ADAPTIVE IDLING OF VIRTUAL CENTRAL PROCESSING UNIT
20230229473 · 2023-07-20
Inventors
Cpc classification
International classification
Abstract
The performance of a computer system having a virtual machine executing an idling instruction therein is improved by: determining a state for controlling the execution of the idling instruction for a first virtual CPU; when the controlling state is a first state, executing the idling instruction natively in a physical CPU assigned to the first virtual CPU and resuming execution of instructions by the first virtual CPU when the physical CPU wakes up; and when the controlling state is a second state, emulating execution of the idling instruction, the emulated execution including the steps of configuring a wakeup event, descheduling the first virtual CPU, and selecting a second virtual CPU to resume execution of instructions, and in response to the wakeup event, rescheduling the second virtual CPU, performing a task switch from the first to the second virtual CPU, and resuming execution of instructions by the second virtual CPU.
Claims
1. A method of improving performance of a computer system having a virtual machine running therein and executing an idling instruction, the method comprising: determining by a virtualization software for the virtual machine, a state for controlling the execution of the idling instruction for a first virtual CPU; when the controlling state is a first state, executing the idling instruction natively in a physical CPU assigned to the first virtual CPU and resuming execution of instructions after the idling instruction by the first virtual CPU when the physical CPU wakes up; and when the controlling state is a second state, emulating execution of the idling instruction, the emulated execution including the steps of configuring a wakeup event, descheduling the first virtual CPU, and selecting a second virtual CPU to resume execution of the instructions after the idling instruction, and in response to the wakeup event, rescheduling the second virtual CPU, performing a task switch from the first virtual CPU to the second virtual CPU, and resuming execution of the instructions after the idling instruction by the second virtual CPU.
2. The method of claim 1, wherein when the controlling state is a third state, executing the idling instruction natively in a monitor for the virtual machine.
3. The method of claim 2, wherein when the controlling state is the second state, updating information about the execution of the idling instruction for the virtual CPU based on the emulated execution of the idling instruction, and when the controlling state is the third state, updating information about the execution of the idling instruction for the virtual CPU based on the execution of the idling instruction natively in the monitor.
4. The method of claim 3, wherein the information about the execution of the idling instruction includes a number of times the idling instruction has been executed for the virtual CPU and an average idle time of the virtual CPU.
5. The method of claim 4, wherein an initial state of the controlling state is the third state and the controlling state transitions from the third state to the first state or the second state based on at least the number of times the idling instruction has been executed for the virtual CPU and the average idle time of the virtual CPU.
6. The method of claim 5, wherein the controlling state transitions from the third state to the first state or the second state further based on a run queue that contains a list of virtual CPUs waiting to use the first physical CPU.
7. The method of claim 6, wherein the controlling state transitions from the first state to the third state when a time spent in the third state exceeds a maximum time.
8. The method of claim 6, wherein the controlling state transitions from the second state to the first state when the average idle time of the virtual CPU is greater than or equal to a minimum time and a size of the run queue for the first physical CPU is zero.
9. A computer system having a virtual machine running therein, said computer system comprising: one or more physical CPUs; and a virtualization software for the virtual machine including a kernel that maintains a run queue for each of the physical CPUs, wherein the virtualization software is configured to: determine a state for controlling the execution of an idling instruction for a virtual CPU of the virtual machine; when the controlling state is a first state, execute the idling instruction natively in a physical CPU assigned to the first virtual CPU and resume execution of instructions after the idling instruction by the first virtual CPU when the physical CPU wakes up; and when the controlling state is a second state, emulate execution of the idling instruction, the emulated execution including the steps of configuring a wakeup event, descheduling the first virtual CPU, and selecting a second virtual CPU to resume execution of the instructions after the idling instruction, and in response to the wakeup event, reschedule the second virtual CPU, perform a task switch from the first virtual CPU to the second virtual CPU, and resume execution of the instructions after the idling instruction by the second virtual CPU.
10. The computer system of claim 9, wherein the virtualization software is further configured to: when the controlling state is a third state, execute the idling instruction natively in a monitor for the virtual machine.
11. The computer system of claim 10, wherein the virtualization software is further configured to: when the controlling state is the second state, update information about the execution of the idling instruction for the virtual CPU based on the emulated execution of the idling instruction, and when the controlling state is the third state, update information about the execution of the idling instruction for the virtual CPU based on the execution of the idling instruction natively in the monitor.
12. The computer system of claim 11, wherein the information about the execution of the idling instruction includes a number of times the idling instruction has been executed for the virtual CPU and an average idle time of the virtual CPU.
13. The computer system of claim 12, wherein an initial state of the controlling state is the third state and the controlling state transitions from the third state to the first state or the second state based on at least the number of times the idling instruction has been executed for the virtual CPU and the average idle time of the virtual CPU.
14. The computer system of claim 13, wherein the controlling state transitions from the third state to the first state or the second state further based on a run queue that contains a list of virtual CPUs waiting to use the first physical CPU.
15. The computer system of claim 14, wherein the controlling state transitions from the first state to the third state when a time spent in the third state exceeds a maximum time.
16. The computer system of claim 14, wherein the controlling state transitions from the second state to the first state when the average idle time of the virtual CPU is greater than or equal to a minimum time and a size of the run queue for the first physical CPU is zero.
17. A non-transitory computer-readable medium comprising instructions that are executable in a computer system having a virtual machine running therein and executing an idling instruction, to cause the computer system to carry out a method that comprises the steps of: determining by a virtualization software for the virtual machine, a state for controlling the execution of the idling instruction for a first virtual CPU; when the controlling state is a first state, executing the idling instruction natively in a physical CPU assigned to the first virtual CPU and resuming execution of instructions after the idling instruction by the first virtual CPU when the physical CPU wakes up; and when the controlling state is a second state, emulating execution of the idling instruction, the emulated execution including the steps of configuring a wakeup event, descheduling the first virtual CPU, and selecting a second virtual CPU to resume execution of the instructions after the idling instruction, and in response to the wakeup event, rescheduling the second virtual CPU, performing a task switch from the first virtual CPU to the second virtual CPU, and resuming execution of the instructions after the idling instruction by the second virtual CPU.
18. The non-transitory computer-readable medium of claim 17, wherein the method further comprises the step of: when the controlling state is a third state, executing the idling instruction natively in a monitor for the virtual machine.
19. The non-transitory computer-readable medium of claim 18, wherein the method further comprises the steps of: when the controlling state is the second state, updating information about the execution of the idling instruction for the virtual CPU based on the emulated execution of the idling instruction, and when the controlling state is the third state, updating information about the execution of the idling instruction for the virtual CPU based on the execution of the idling instruction natively in the monitor.
20. The non-transitory computer-readable medium of claim 19, wherein the information about the execution of the idling instruction includes a number of times the idling instruction has been executed for the virtual CPU and an average idle time of the virtual CPU.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
DETAILED DESCRIPTION
[0019] One or more embodiments improve the performance of a computer system having a virtual machine that is executing an idling instruction, e.g., mwait instruction, by adaptively executing the idling instruction according to one of several controlling states. The controlling states include the performance state that improves wake-up latency, the throughput state that improves CPU resource usage, and the learning state during which data about the execution of the mwait instruction, which are used in determining transitions between the controlling states, are collected.
[0020]
[0021] A virtualization software layer, referred to hereinafter as a hypervisor, is installed on top of hardware platform 102. Hypervisor 111 makes possible the concurrent instantiation and execution of one or more VMs 1181-118N. The interaction of a VM 118 with hypervisor 111 is facilitated by corresponding virtual machine monitors (VMMs) 134. Each VMM 134.sub.1-134.sub.N is assigned to and monitors a corresponding VM 1181-118N. In one embodiment, hypervisor 111 may be a hypervisor implemented as a commercial product in VMware's vSphere® virtualization product, available from VMware Inc. of Palo Alto, CA. In an alternative embodiment, hypervisor 111 runs on top of a host operating system which itself runs on hardware platform 102. In such an embodiment, hypervisor 111 operates above an abstraction level provided by the host operating system.
[0022] After instantiation, each VM 1181-118N encapsulates a physical computing machine platform that is executed under the control of hypervisor 111. Virtual devices of a VM 118 are embodied in a virtual hardware platform 120, which is comprised of, but not limited to, a virtual CPU (vCPU) 122, a virtual random access memory (vRAM) 124, a virtual network interface adapter (vNIC) 126, and virtual storage (vStorage) 128. Virtual hardware platform 120 supports the installation of a guest operating system (guest OS) 130, which is capable of executing applications 132. Examples of a guest OS 130 include any of the well-known commodity operating systems, such as the Microsoft Windows® operating system, the Linux® operating system, and the like.
[0023] It should be recognized that the various terms, layers, and categorizations used to describe the components in
[0024]
[0025]
[0026] The different controlling states are learning 202, performance 204, and throughput 206. Each of these states controls how an mwait instruction that is encountered in an instruction stream of a VM is to be executed. In the learning state, the mwait instruction is executed in the monitor (e.g., the VMM), and mwait data, which includes data about the execution of the mwait instruction, is updated. In the embodiments described herein, mwait data includes #mwaits (which counts the number of times the mwait instruction is executed for the VM) and currAve (which keeps track of the average idle time of a virtual CPU when the mwait instruction is executed by the virtual CPU). In one embodiment, currAve keeps track of an exponentially weighted moving average (EWMA) of the idle time of the virtual CPU when the mwait instruction is executed for the virtual CPU. In the throughput state, the execution of the mwait instruction is emulated, and the mwait data is updated. In the performance state, the mwait instruction is executed in a virtual CPU of the VM.
[0027] After initialization, the controlling state for executing the mwait instruction for the VM is the learning state. As part of the initialization, #mwaits and currAve, are set to zero, and the monitor instruction that arms an address range of memory for specific events is executed. Transitions to the other states from the learning state are depicted as T1, T2, T3, T4, and T5 in
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035] After control is passed to the kernel, the kernel deschedules the virtual CPU (step 502 in
[0036] After control is returned to the monitor, the monitor wakes up the virtual CPU (step 426 in
[0037]
[0038] If the average idle time (currAve) of the virtual CPU is less than a minimum time (minAve) as determined in step 1006 and the value of pcpu load of the physical CPU to which the virtual CPU is assigned is equal to zero as determined in step 1008, the flow proceeds to step 1010, where it is checked if there is any monitor instruction in process. If there is none (monCleared =True), then the monitor transitions the controlling state from the learning state to the performance state in step 1012. This transition is depicted as T1 in
[0039] If the average idle time (currAve) of the virtual CPU is greater than or equal to the minimum time as determined in step 1006 or if the value of pcpu_load is greater than zero as determined in step 1008, then the monitor transitions the controlling state from the learning state to the throughput state in step 1016. This transition is depicted as T3 in
[0040] Thus, if the demand for the physical CPU to which the virtual CPU is assigned is low (pcpu_load=0) and the average idle time of the virtual CPU is low (currAve<minAve), then a transition to the performance state occurs, thereby improving wakeup latency of the virtual CPU executing the mwait instruction. On the other hand, if either the demand for the physical CPU to which the virtual CPU is assigned is high (pcpu_load>0) or the average idle time of the virtual CPU is high (currAve≥minAve), then a transition to the throughput state occurs, thereby improving physical CPU usage.
[0041]
[0042]
[0043] Optionally, as depicted in dashed lines in
[0044] In yet another option, which is not depicted in
[0045] Certain embodiments as described above involve a hardware abstraction layer on top of a host computer. The hardware abstraction layer allows multiple contexts to share the hardware resource. These contexts are isolated from each other in one embodiment, each having at least a user application program running therein. The hardware abstraction layer thus provides benefits of resource isolation and allocation among the contexts. In the foregoing embodiments, virtual machines are used as an example for the contexts and hypervisors as an example for the hardware abstraction layer. As described above, each virtual machine includes a guest operating system in which at least one application program runs. It should be noted that these embodiments may also apply to other examples of contexts, such as containers not including a guest operating system, referred to herein as “OS-less containers” (see, e.g., www.docker.com). OS-less containers implement operating system-level virtualization, wherein an abstraction layer is provided on top of the kernel of an operating system on a host computer. The abstraction layer supports multiple OS-less containers, each including an application program and its dependencies. Each OS-less container runs as an isolated process in userspace on the host operating system and shares the kernel with other containers. The OS-less container relies on the kernel's functionality to make use of resource isolation (CPU, memory, block I/O, network, etc.) and separate namespaces and to completely isolate the application program's view of the operating environments. By using OS-less containers, resources can be isolated, services restricted, and processes provisioned to have a private view of the operating system with their own process ID space, file system structure, and network interfaces. Multiple containers can share the same kernel, but each container can be constrained only to use a defined amount of resources such as CPU, memory, and I/O.
[0046] Certain embodiments may be implemented in a host computer without a hardware abstraction layer or an OS-less container. For example, certain embodiments may be implemented in a host computer running a Linux® or Windows® operating system.
[0047] The various embodiments described herein may be practiced with other computer system configurations, including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
[0048] One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer-readable media. The term computer-readable medium refers to any data storage device that can store data which can thereafter be input to a computer system. Computer-readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer-readable medium include a hard drive, network-attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD (Compact Discs)—CD-ROM, a CDR, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer-readable medium can also be distributed over a network-coupled computer system so that the computer-readable code is stored and executed in a distributed fashion.
[0049] Although one or more embodiments of the present invention have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation unless explicitly stated in the claims.
[0050] Plural instances may be provided for components, operations, or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claim(s).