POWER MANAGEMENT OF A COMPUTING SYSTEM
20230018342 · 2023-01-19
Inventors
Cpc classification
G06F1/3203
PHYSICS
G06F2009/4557
PHYSICS
G06F1/28
PHYSICS
Y02D10/00
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06F1/263
PHYSICS
International classification
Abstract
A method for power management of a computing system having two or more physical servers for hosting virtual machines of a virtual system and one or more uninterruptible power supplies for supplying at least a subset of the physical servers with power, each of the one or more uninterruptible power supplies being connected to a phase of a multiple phase power supply, is disclosed. The method comprises receiving an action input for the computing system, which may impact the power consumption of the physical servers, processing the received action input with a predictive model of power consumption of the physical servers with regard to the battery autonomy of the one or more uninterruptible power supplies and/or the load balancing of the several phases of the multiple phase power supply, and optimizing the utilization of the physical servers based on the result of the processing.
Claims
1-10. (canceled)
11. A method for power management of a computing system, which comprises two or more physical servers for hosting virtual machines of a virtual system and one or more uninterruptible power supplies for supplying at least a subset of the physical servers with power, each of the one or more uninterruptible power supplies being connected to a phase of a multiple phase power supply, the method comprising: receiving an action input for the computing system, which may impact the power consumption of the physical servers (12); processing the received action input with a predictive model (100) of power consumption of the physical servers (12) regarding the battery autonomy of the one or more uninterruptible power supplies (16) and/or the load balancing of the several phases of the multiple phase power supply (20); and, optimizing the utilization of the physical servers (12) based on the result of the processing.
12. The method of claim 11, comprising: receiving measurements related to the operation of the physical servers; using an artificial intelligence or machine learning algorithm for learning the power consumption of one or more individual parts of the computing system depending on actions and the measurements; and, generating and/or improving the predictive model of power consumption of the physical servers (12) based on the output of the machine learning algorithm and the measurements.
13. The method of claim 12, wherein the measurements related to the operation of the physical servers comprises at least one of the following: total power consumption of the computing system; temperature of the environment of the computing system; virtual machines activity; power consumption of single physical servers; the processor activity of single physical servers; the mapping of virtual machines on the physical servers.
14. The method of claim 12, wherein the machine learning algorithm receives a training data set based on the received measurements and a validation data set based on the received measurements and processes the training data set and the validation data set to generate the predictive model.
15. The method of claim 11, wherein the optimizing of the utilization of the physical servers (12) based on the result of the processing comprises: receiving optimization constraints and optimization actions of the computing system; determining one or more actions from the optimization actions for fulfilling the optimization constraints; and, using the determined one or more actions for the power management of the computing system.
16. The method of claim 15, wherein the determining of one or more actions from the optimization actions for fulfilling the optimization constraints comprises determining a sequence of shutdown actions and/or shifting actions of virtual machines and/or physical servers depending on the remaining battery autonomy of the one or more uninterruptible power supplies and/or depending on the load balancing of the several phases of the multiple phase power supply.
17. A system for power management of a computing system, which comprises two or more physical servers for hosting virtual machines of a virtual system and one or more uninterruptible power supplies for supplying at least a subset of the physical servers with power, each of the one or more uninterruptible power supplies being connected to a phase of a multiple phase power supply, the power management system comprising: a predictive model of power consumption of the physical servers, the predictive model being provided to receive an action input for the computing system, which may impact the power consumption of the physical servers, and to process the received action input with regard to the battery autonomy of the one or more uninterruptible power supplies and/or the load balancing of the several phases of the multiple phase power supply; and, an optimizer being provided for optimizing the utilization of the physical servers based on the result of the processing by the predictive model.
18. The system of claim 17, wherein the optimizer is provided to: receive optimization constraints and optimization actions of the computing system; determine one or more actions from the optimization actions for fulfilling the optimization constraints; and, use the determined one or more actions for the power management of the computing system.
19. The system of claim 18, wherein the optimizer is provided to determine one or more actions from the optimization actions for fulfilling the optimization constraints by determining a sequence of shutdown actions of virtual machines and/or physical servers depending on the remaining battery autonomy of the one or more uninterruptible power supplies and/or depending on the load balancing of the several phases of the multiple phase power supply.
20. A non-transitory computer-readable storage device storing software comprising instructions executable by a processor of a computing device which, upon such execution, cause the computing device to perform the method of claim 11.
Description
DESCRIPTION OF DRAWINGS
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
DETAILED DESCRIPTION
[0027] In the following, functionally similar or identical elements may have the same reference numerals. Absolute values are shown below by way of example only and should not be construed as limiting.
[0028] The term “virtual machine”—VM—used herein describes an emulation of a particular computer system. A VM is in the context of the present invention a special case of computer program with an operating system. The solution also applies to “light weight” VMs also called “containers”. The term “physical server”—PS—used herein describes an entity comprising a physical computer. A PS may comprise a hypervisor software, which configures the physical computer to host one or more virtual machines. The PS is in the context of the present invention a special case of computing device. The term “virtual system” used herein designates a system comprising two or more PSs, each hosting at least one VM, and wherein at least two of the PSs are supplied by different single phase electrical lines split off from of a multi-phase power input line. The term “computing system” as used herein generally describes a system comprising software and hardware as for example employed in a datacenter. The virtual system is in the context of the present invention a special case of computing system. A computing system may comprise one or more virtual systems.
[0029] For a datacenter, establishing a proper business continuity plan for managing power loss is key to avoid critical data loss. Through the software IPM (Intelligent Power Manager), some actions on the IT system equipment of a datacenter can be predefined and automated as soon as a power failure is detected. When IT actions such as a VM move, a VM shutdown, a VM placement, a PS shutdown, a VM start, a PS start up or booting, a NAS (Network Attached Storage) startup or booting etc., are configured with the IPM software, the power impact of these actions is not known. It is also impossible to predict in advance if the IT actions on non-critical loads will sufficiently and significantly increase UPS autonomy to keep alive critical VMs during an expected time frame. An IT action sequence configured with the IPM software is predefined and static.
[0030] So-called “Green” IT mechanisms such as the above-mentioned Distributed Power Management (DPM) software are currently proposed to optimize datacenter power consumption during datacenter normal operation. These mechanisms are often based on following scenario: concentrate the VM placement on a reduced set of servers and shutdown the non-necessary servers. However, such mechanisms are not used during a power crisis context (a business continuity plan executed during an UPS autonomy) or are used regardless of datacenter multiple phase, particularly 3-phase balance criteria, and/or are not used to participate to Grid stability through energy demand response mechanism. Consequently, a server shutdown initiated by “green” IT mechanisms can thus degrade the phase balance of the mains power supply in a datacenter and can also have a negative impact on power consumption.
[0031] The methods and systems described in this disclosure intend to predict and quantify how much each individual IT action such as a VM move, a VM shutdown, a VM placement, a PS shutdown, a VM start, a PS start up or booting, etc., will impact, particularly decrease IT load consumption. The prediction as described herein may be applied particularly to the following use cases: [0032] a UPS autonomy sequence; [0033] a multi-phase, particularly a 3-phase load balancing. [0034] an energy demand response mechanism to contribute to stabilize the power grid when needed.
[0035] The prediction may be based on: [0036] IT and power data acquisition (VM resource consumption, PS consumption, . . . ) [0037] an artificial Intelligence (AI) model for a power consumption prediction.
[0038] With the prediction, a load shedding sequence may be dynamically scheduled particularly due to an AI algorithm to optimize runtime for critical VMs.
[0039] The methods and systems described in this disclosure may collect one or more datasets from existing assets, particularly UPSs, ePDUs, PSs, VMs, and use AI/ML (Machine Learning) techniques to continuously control and particularly optimize the utilization of IT system equipment or IT resources, which is for example employed in a datacenter.
[0040] The methods and systems described in this disclosure may allow to reduce energy-related costs of IT system equipment particularly of a datacenter and may provide “augmented intelligence” to human operators in case of a power crisis.
[0041]
[0042] Computing system 10 is shown in more detail in
[0043] In
[0044]
[0045]
[0046] The predictive model 100 is generated based on the output of an AI/machine learning (ML) algorithm 118 and measurements 116 related to the operation of the PSs 12 of the computing system 10. The measurements 116 may comprise measured outputs or inputs such as the total power consumption (kW), the temperature of the environment of the computing system 10 such as the temperature of the room, in which the computing system 10 is operated, the VM activity on the computing system 10, the power consumption of one or more PSs of the computing system 10, the CPU activity of one or more PSs of the computing system 19, the VM mapping on the PSs of the computing system, etc.
[0047] From the measurements 116, a training data set 120 and a validation data set 122 are created, which are forwarded to the IA/machine learning algorithm 118 for processing to generate the predictive model 100.
[0048] An optimizer 104 is provided for optimizing the utilization of the PSs 12 based on the result 106 of the processing of by the predictive model 100. The optimizer 104 may be provided to receive optimization constraints 108, for example according to a Service Level Agreement (SLA) requiring a specific or minimum level of Qualtiy of Service (QoS), and optimization actions 110, for example a workload consolidation (VM migration/shutdown), an idle server shutdown, energy-aware scheduling policies, power capping/DVFS, etc., of the computing system 10, determine one or more actions 112 from the optimization actions for fulfilling the optimization constraints, and use the determined one or more actions for the power management 114 of the computing system 10, particularly for obtaining an optimized metrics such as the total energy consumption, ITEU (IT equipment utilization), PUE (Power Usage Effectiveness), QoS, etc.
[0049] In the following, it is described by means of an example how an accurate prediction of the UPS autonomy during an IT safeguard policy (an IPM2 automation plan), which is triggered in case of a power outage, may be processed.
[0050] A system-specific machine learning algorithm, which is based on an estimation of the power saving of IT actions in a virtual system, particularly a virtualized datacenter, is provided. These power saving estimations may then be injected into an existing hard coded experimental UPS autonomy model to estimate the impact of these actions on the UPS autonomy before a power crisis happens.
[0051] An example of IT actions with their expected power benefits and the respective UPS autonomy increase is listed in the followings: [0052] 1. IT action: shutdown 10 “priority 3” VMs; expected power benefit: 0.7 kW; UPS autonomy increase: 2 minutes. [0053] 2. IT action: shutdown 2 “priority 2” hypervisors; expected power benefit: 1.3 kW; UPS autonomy increase: 3 minutes. [0054] 3. IT action: run consolidation algorithm (new energy aware VM placement); expected power benefit: 2 kW; UPS autonomy increase 6 minutes. [0055] 4. Shutdown 6 outlets on a ePDU; expected power benefit: 1 kW; UPS autonomy increase: 2 minutes.
[0056]
[0057] The automation plan is described here as another illustrative example and comprises the following steps: [0058] 1. Wait for the UPS battery capacity falling below 75%. [0059] 2. IT action: power off 7 VMs. [0060] 3. Wait for the UPS battery capacity falling below 50%. [0061] 4. IT action: graceful shutdown of 2 VMs and 2 PSs. [0062] 5. Wait for the UPS battery capacity falling below 25%. [0063] 6. IT action: graceful shutdown of 1 VM and 1 PS.
[0064]
[0065] In
[0066] In
[0067] In
[0068] In
[0069] With the above described automation plan, the operation of the “priority 1” VMs and the PSs “Server1” and “Server2”, which host the “priority 1” “VMs”, is extended if possible, as can be seen in
[0070]
[0071] For the prediction model, a UPS battery autonomy model can be generated from the UPS output power monitoring, as shown in
[0072] The server power model for each server can be for example defined by the following simple equation:
P.sub.server=P.sub.idle+θ.sub.1.Math.CPU.sub.server+θ.sub.2.Math.Disk.sub.server
[0073] More complex server models and/or more accurate server models can also be used (e.g. neural network models).
[0074] One approach to find out a server's power model can comprise a classical, batch machine learning, as shown in
[0075] Another approach to find out a server's power model can be online machine learning, as shown in
[0076] In the following, the batch and online machine learnings cons & pros are compared:
TABLE-US-00001 Online machine learning Batch machine learning Low/no data storage (Learn & Storage needed for the dataset drop) Dynamic model, adapts to long Static model: unless a new term changes: temperature/ dataset is built to update the humidity variations, hardware model aging/wear out Model available immediately No model available until the dataset is ready Requires stable data stream Learn on the provided dataset to learn over time (resource in a single step/phase usage vs power consumption) Low control on the data may Control the quality of the result in low quality of the model: through the quality model: learn on data as it of the dataset comes, data may possibly be biased