TEMPERATURE PREDICTION SYSTEM AND METHOD FOR PREDICTING A TEMPERATURE OF A CHIP OF A PCIE CARD OF A SERVER
20220156171 · 2022-05-19
Inventors
Cpc classification
G06F11/3058
PHYSICS
G06F11/3031
PHYSICS
International classification
G06F11/14
PHYSICS
Abstract
To predict a temperature of a chip of a PCIe card of a server, use a gated recurrent unit of a recurrent neural network to define a temperature prediction model for the chip, collect training data of the temperature prediction model according to mutual response changes of control variables, use the training data to train the temperature prediction model to obtain a training result close to a measured temperature of the chip and evaluate the training result to obtain features that best reflect the temperature change of the chip, perform an error analysis on the training result to obtain a set of key features from the features, form a temperature predictor according to the set of key features and the temperature prediction model, and generate a predicted temperature of the chip by the temperature predictor.
Claims
1. A method for predicting a temperature of a chip of a PCIe card of a server comprising: using a gated recurrent unit of a recurrent neural network to define a temperature prediction model for the chip, the temperature prediction model comprising an input terminal and an output terminal; collecting training data of the temperature prediction model according to mutual response changes of a plurality of control variables; using the training data to train the temperature prediction model at the input terminal to obtain a training result close to a measured temperature of the chip from the output terminal, and evaluate the training result to obtain a plurality of features that best reflect the temperature change of the chip; performing an error analysis on the training result to obtain a set of key features from the plurality of features; forming a temperature predictor according to the set of key features and the temperature prediction model; and generating a predicted temperature of the chip by the temperature predictor.
2. The method of claim 1 wherein the plurality of control variables comprise: chip power of the PCIe card being in an on state or an off state; a utilization rate of a processor being in an idle state, 25% utilization rate, 50% utilization rate, 75% utilization rate or 100% utilization rate; a fan speed of the server being 30% of full speed, 40% of full speed, 50% of full speed, 60% of full speed, 70% of full speed, 80% of full speed, 90% of full speed or 100% of full speed; and an intake air temperature of the server being between 18° C. and 25° C.
3. The method of claim 2 wherein the training data comprises the utilization rate of the processor, the fan speed of the server, the chip power of the PCIe card and the measured temperature of the chip.
4. The method of claim 3 wherein the measured temperature is obtained from a thermocouple sensor disposed on the chip.
5. The method of claim 3 wherein the plurality of features comprise any combination of a group consisting of the utilization rate of the processor, the fan speed of the server, the chip power of the PCIe card, the measured temperature of the chip and the intake air temperature of the server, and the set of key features comprises the chip power of the PCIe card, the fan speed of the server, the temperature of the processor and the intake air temperature of the server.
6. The method of claim 1 wherein the error analysis is a root mean square error analysis.
7. The method of claim 1 further comprising controlling a fan speed of the server according to the predicted temperature of the chip.
8. A temperature prediction system comprising: a server comprising a PCIe card and a fan; a temperature predictor comprising: a temperature prediction model defined by a gated recurrent unit (GRU) of a recurrent neural network (RNN) for a chip of the PCIe card; and a set of key features that best reflect a temperature change of the chip; and a baseboard management controller configured to control a temperature prediction model to generate a predicted temperature of the chip of the PCIe card according to the set of key features, and control a fan speed of the server according to the predicted temperature.
9. The temperature prediction system of claim 8 wherein the set of key features comprises the chip power of the PCIe card, the fan speed of the server, the temperature of the processor and the intake air temperature of the server.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009]
[0010]
[0011]
[0012]
[0013]
DETAILED DESCRIPTION
[0014]
[0015] The temperature prediction system 100 further comprises a temperature predictor. The temperature predictor comprises a temperature prediction model defined by a gated recurrent unit (GRU) of a recurrent neural network (RNN) for the chip of the PCIe card 12, and a set of key features that best reflect the temperature change of the chip of the PCIe card 12. The temperature prediction model and the set of key features can be stored in the memory 4 and executed by the central processing unit 2. The memory 4 and central processing unit 2 can be in any form.
[0016] Please refer to
TABLE-US-00001 TABLE 1 Control variables Control range Control range adjustment Chip of PCIe card ON/OFF ON OFF CPU utilization rate 0-100% Idle 25% 50% 75% 100% Fan speed 30-100% 30 40 50 60 70 80 90 100 Server inlet temperature 18-25° C. 18-25° C.
[0017] With reference to the control variables in Table 1, the control range adjustment is only for illustration and is not used to limit the present invention. Control variables can be used to generate input data for predictive models. The chip power P of the PCIe card 12 may be in one of two states: ON and OFF. The control signal of the fan speed U is a pulse-width modulation (PWM) signal which may correspond to one of eight states: 30% speed, 40% speed, 50% speed, 60% speed, 70% speed, 80% speed, 90% speed and 100% speed. The utilization rate of the central processing unit 2 may be in one of five states: idle state, 25% utilization rate, 50% utilization rate, 75% utilization rate and 100% utilization rate, which is the main heat source affecting the downstream PCIe card 12. In the embodiment, the fan speed, the chip power P of the PCIe card 12, and the utilization rate of the CPU 2 can be controlled by the program, and the intake air temperature T.sub.amb of the server 30, the temperature T.sub.CPU of the CPU 2 and the chip temperature T.sub.PCIE of the PCIe card 12 can be detected to train the temperature prediction model 200. In the design stage of the server 30, a thermocouple sensor can be used in advance to sense the chip of the PCIe card 12, thereby obtaining the temperature of the chip. After the training is completed, the chip on the PCIe card 12 does not have a thermocouple sensor, but the temperature prediction model 200 in the embodiment can be used to predict the change of the chip temperature T.sub.PCIE.
TABLE-US-00002 TABLE 2 Errors Input features Greatest T.sub.amb T.sub.CPU T.sub.in P U RMSE error 1 x x ∘ ∘ ∘ 1.107 5.478 2 x ∘ x ∘ ∘ 0.737 6.356 3 ∘ x x ∘ ∘ 5.706 13.666 4 x ∘ ∘ ∘ ∘ 0.371 2.548 5 ∘ x ∘ ∘ ∘ 1.020 4.69 6 x ∘ x ∘ ∘ 0.487 2.95 7 ∘ ∘ ∘ ∘ ∘ 0.395 2.684
[0018] Table 2 is an error analysis of the results after training under various input features. The error data is an illustration of the experimental results according to the present invention, and is not used to limit the present invention. In Table 2, o represents this feature is being used, and x represents this feature is not being used. The chip power P and fan speed U of the PCIe card 12 are both key features. From the root mean square error (RMSE) analysis, adding T.sub.amb, T.sub.in, and T.sub.CPU can produce a relatively small error range (the fourth group of input features). Therefore, the embodiment selects the chip power P of the PCIe card 12, the fan speed U, the temperature T.sub.CPU of the central processing unit 2, and the inlet temperature Tin of the PCIe card 12 as the key features of the temperature predictor. However, the present invention is not limited to this. In another embodiment, the key features can include any combination of the features in Table 2.
[0019]
[0020]
[0021] In summary, the embodiment discloses a temperature prediction system and method for the PCIe chip of the server, including training data and output data for defining the temperature prediction model of the PCIe chip of the server, using the training data to train and test the temperature prediction model, adjusting the temperature prediction model so that the output data of the temperature prediction model is close to the measured value, and using the temperature prediction model and the temperature predictor formed by the key features to predict the temperature of the chip of the PCIe card. In this way, the temperature change of the chip of the PCIe card can be predicted, solving the time delay problem of the fan speed response.
[0022] In an embodiment of the present invention, the temperature predictor and method for the PCIe chip can be applied to a server. The server can be used in artificial intelligence (AI) operations and edge computing. The server can also be a 5G server, cloud server or car networking server.
[0023] Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.