THERMAL CONTROL FOR VEHICLE MOTOR

20220077810 · 2022-03-10

    Inventors

    Cpc classification

    International classification

    Abstract

    The disclosed computer-implemented method optimizes thermal control of a vehicle motor, the vehicle including a cooling device including an actuator varying cooling capacity, the method including training a reinforcement learning algorithm including the iterative steps: 1) determining an action to control an actuator by applying a control function to a current state of the thermal system, and implementing the action; 2) determining a modified state of the thermal system after implementing the action; 3) calculating, by implementing a thermodynamic reward function of the motor, a reward value based on the modified state of the thermal system, and the action; 4) updating a function for estimating thermal performance based on the current state of the thermal system, the modified state of the thermal system, the action and the reward; and 5) modifying the control function based on the update of the function for estimating thermal performance.

    Claims

    1. A method for optimizing the thermal control of a vehicle motor, the vehicle comprising a device for cooling the motor comprising at least one actuator suitable for varying a capacity for cooling the motor by means of the cooling device, the method being implemented by a computer suitable for controlling said at least one actuator, the method comprising training of a reinforcement learning algorithm comprising iterative implementation of the following steps: 1) determining at least one action to control at least one actuator by applying a control function to a current state of the thermal system comprising the motor and the cooling device, and implementing said action, 2) determining a modified state of the thermal system after the implementation of said action, 3) calculating, by implementing a thermodynamic reward function of the motor, a reward value based on the modified state of the thermal system, and said action, 4) updating a function for estimating the thermal performance of the system based on the current state of the thermal system, the modified state of the thermal system, the action and the reward, and 5) modifying the control function based on the update of the function for estimating the thermal performance of the system.

    2. The method for optimizing the thermal control as claimed in claim 1, in which in step 1, an exploration noise is added to the determination of the control action or to the parameters of the control function.

    3. The method for optimizing the thermal control as claimed in claim 1, in which the thermodynamic reward function of the motor is configured to maximize the reward value when, based on the current state, the generation of thermodynamic irreversibilities brought about by the action is minimized.

    4. The method for optimizing the thermal control as claimed in claim 1, in which the modified state of the thermal system after an action comprises at least one parameter identifying at least one action preceding said action.

    5. The method for optimizing the thermal control as claimed in claim 1, in which the thermodynamic reward function of the motor is configured to penalize an action when the action causes the temperature of the motor to exceed a predetermined threshold.

    6. The method for optimizing the thermal control as claimed in claim 1, in which the thermodynamic reward function of the motor is configured to penalize an action when the action is implemented while the ambient temperature is greater than the temperature of the motor.

    7. The method for optimizing the thermal control as claimed in claim 1, in which the state of the thermal system is defined by at least one parameter from the following group: the air speed around the vehicle, the on or off state of the motor in the near future, one or more temperatures of the motor, one or more entropy values of the thermal system, and one or more actions implemented before the current state.

    8. The method for optimizing the thermal control as claimed in claim 1, wherein the motor is an electric motor.

    9. The method for optimizing the thermal control as claimed in claim 1, wherein the function for estimating the thermal performance takes the following form: Q π ( s t , u t ) = r t + E [ .Math. j = 1 n - 1 γ j r t + j + γ n Q π ( s t + n , μ ( s t + n ) ) .Math. s t , u t ] where γ is a depreciation factor, π is the set of parameters of the control function, and n is a number of additional time steps taken into account for calculating the function for estimating the thermal performance.

    10. The method for optimizing the thermal control as claimed in claim 9, wherein the depreciation factor γ is between 0.8 and 1 inclusive.

    11. The method for optimizing the thermal control as claimed in claim 9, wherein the value between two time steps of the training of the reinforcement learning algorithm is determined in correlation with the value n, and vice versa.

    12. A non-transitory computer-readable medium on which is stored a computer program containing coded instructions for implementing the method as claimed in claim 1, when the computer program is executed by a computer.

    13. A thermal control system for a vehicle motor comprising a computer that is suitable for implementing at least one action to control at least one actuator by applying a control function, said control function having been determined in advance by implementing the optimization method as claimed in claim 1.

    14. The thermal control system for a vehicle motor, comprising a computer, suitable for implementing the method as claimed in claim 1.

    15. The method for optimizing the thermal control as claimed in claim 2, in which the thermodynamic reward function of the motor is configured to maximize the reward value when, based on the current state, the generation of thermodynamic irreversibilities brought about by the action is minimized.

    16. The method for optimizing the thermal control as claimed in claim 2, in which the modified state of the thermal system after an action comprises at least one parameter identifying at least one action preceding said action.

    17. The method for optimizing the thermal control as claimed in claim 3, in which the modified state of the thermal system after an action comprises at least one parameter identifying at least one action preceding said action.

    18. The method for optimizing the thermal control as claimed in claim 2, in which the thermodynamic reward function of the motor is configured to penalize an action when the action causes the temperature of the motor to exceed a predetermined threshold.

    19. The method for optimizing the thermal control as claimed in claim 3, in which the thermodynamic reward function of the motor is configured to penalize an action when the action causes the temperature of the motor to exceed a predetermined threshold.

    20. The method for optimizing the thermal control as claimed in claim 4, in which the thermodynamic reward function of the motor is configured to penalize an action when the action causes the temperature of the motor to exceed a predetermined threshold.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0031] Further features, aims and advantages of the invention will become apparent from the following description, which is purely illustrative and non-limiting, and which must be read with reference to the appended figures, in which:

    [0032] FIG. 1 shows the thermal system of a vehicle motor according to one embodiment of the invention.

    [0033] FIG. 2 shows the main steps of the method for optimizing the thermal control according to one embodiment of the invention.

    [0034] FIG. 3 shows the actor-critic architecture implemented by a computer according to one embodiment of the invention.

    DESCRIPTION OF THE PREFERRED EMBODIMENTS

    [0035] With reference to FIG. 2, a method for optimizing the thermal control of a motor of the vehicle according to one embodiment of the invention will now be described. This method makes it possible to manage the cooling of the motor of the vehicle so as to both keep the motor of the vehicle in an acceptable temperature range and reduce the electricity consumption of the cooling system of the vehicle as much as possible.

    [0036] In this regard, the optimization method is implemented on a thermal system of a vehicle motor 1, shown schematically in FIG. 1, comprising a motor 10, for example but not limited to an electric motor, and a device 30 for cooling the motor, comprising at least one actuator 50, suitable for varying a capacity for cooling the motor.

    [0037] In a preferred embodiment, the motor can be an electric motor. As electric motors must not be subjected to temperature stresses that are too sudden for the longevity and performance of the different electronic components that they contain, and as they impose additional constraints in terms of range, the invention can be particularly advantageously applied to this type of motor.

    [0038] The optimization method is implemented by a computer 20, also shown in FIG. 1, which is suitable for receiving information about the state of the motor and the cooling device, this information being measured by one or more sensors embedded in or on the vehicle, and for controlling the actuator(s) of the cooling device of the motor by applying actions u.sub.t to control each actuator. One action to control an actuator of the cooling device is suitable for varying a cooling capacity of the cooling device.

    [0039] With respect to the information acquired by the sensors and received by the computer 20, this component of the state of the system will be described in greater detail below.

    [0040] With respect to the control action(s), they depend on the components of the cooling device, which can comprise at least one of the following: inverter, battery, pump, valve, louver, fan, radiator, flow pipes, coolant. In reality, the cooling device can consist of all types of element alone or in combination that make it possible to cool a vehicle motor. As the optimization of the thermal control of a vehicle motor according to the invention is not specific to one cooling system, all combinations of cooling devices are considered.

    [0041] For example, if the cooling device is a pump that transfers a coolant towards the motor, a control action u.sub.t of the pump can be the modification of the flow rate of the pump.

    [0042] In another example, the cooling device is a valve opening to the outside and the control action u.sub.t thereof consists of opening or closing the valve by a certain angle.

    [0043] According to another embodiment, a control action can also consist of activating blades of a fan, at a predetermined speed, to cool the motor.

    [0044] It is also possible for a control action to consist of opening or closing a louver, to a position among a plurality of possible positions, in order to cool a radiator when it is discharging the heat from the motor.

    [0045] The invention does not however rule out the possibility of the control actions defined in the preceding examples being used at the same time or combined in one way or another. The fan, pump, valve, louver and radiator thus form an integral part of the cooling device covered by the invention and are not considered to be several separate cooling devices.

    [0046] The computer 20 can therefore determine a control action u.sub.t for each of a plurality of different elements of the same cooling device.

    [0047] In order to implement the optimization method, the computer 20 advantageously has an actor-critic architecture, described in the publication Natural Actor-Critic (Jan Peters, Sethu Vijayakumar and Stefan Schaal, 2008), the architecture of the computer 20 being shown schematically in FIG. 3. More specifically, the computer is advantageously configured to implement a DDPG reinforcement learning algorithm, which is a specific type of algorithm based on an actor-critic architecture described in the publication by Lillicrap et al cited above. Hereinafter, the notations used in said publication are used to describe the same objects or functions.

    [0048] This architecture comprises a first block 21 representing the actor of the actor-critic architecture. This block 21 of the computer 20 receives the state s.sub.t of the thermal system and determines at least one control action u.sub.t to perform, by applying a control function n to the state s.sub.t. Advantageously, this block is implemented by an artificial neural network implementing the control function π.

    [0049] The state of the thermal system is advantageously a vector comprising several parameters. According to one embodiment, the parameters of the state vector of the system comprise control actions u.sub.t−1 u.sub.t−2 u.sub.t−3 . . . u.sub.t−n preceding the control action determined by the block 21 for the state s.sub.t. Advantageously, the parameters also comprise all or some of the following parameters: the air speed around the vehicle, the state of the motor in the near future (on or off), one or more temperatures of the motor, the entropy of the system in its current state and/or at least one of the preceding states. The state of the motor in the near future can be represented by a time before the motor is switched off, if the switching off of the motor is predicted or predictable, and failing this by parameters of the motor in the near future (t.sub.+1, t.sub.+2, . . . , t.sub.+n) such as, for example, but not limited to, the torque of the motor or its speed in revolutions per minute. The vehicle can therefore comprise one or more temperature sensors, one or more air speed temperatures, or one or more accelerometers, for example. In reality, it can comprise a set of sensors that make it possible to retrieve various navigation data, which can be used to determine the state of the thermal system.

    [0050] In addition, the parameters of the state of the thermal system that are a function of time can have a complexity of order n, that is, they can have values for n states of the system preceding the time t. Thus for example, a state vector of the thermal system having a complexity of order 2 could be as follows:

    S.sub.t=[T, u.sub.t, u.sub.t−1, u.sub.t−2, Q.sub.BSG, BSG.sub.time, V.sub.air]
    where T is one or more temperatures of the motor,
    u.sub.t, u.sub.t−1, u.sub.t−2 are the control actions implemented respectively at t, t−1, and t−2,
    Q.sub.BSG is the quantity of heat of the thermal system at time t and/or the quantity of heat predicted in the near future,
    BSG.sub.time is a prediction of the state of the motor in the near future,
    V.sub.air is the air speed around the vehicle at time t and/or the air speed predicted in the near future.

    [0051] A second block 22 evaluates the impact of the control action u.sub.t on the thermal system and determines the new state of the system s.sub.t+1 together with a reward value r.sub.t+1 associated with the state transition of the system observed from the given state s.sub.t to the modified state s.sub.t+1. To do this, the block 22 retrieves the information from the different sensors in the thermal system and evaluates the reward to be allocated to the control action u.sub.t as a function of the new state of the thermal system, as described in greater detail below. A third block 23 represents the critic of the actor-critic system. The critic block 23 implements and updates a function for estimating the thermal performance of the thermal system as a function of the reward values determined by the block 22, this function being the Q function in the publication cited above, and advantageously being implemented by an artificial neural network. To do this, the critic block 23 comprises four inputs, the first being the action u.sub.t, the second being the given state of the system s.sub.t, and the third and fourth inputs being respectively the reward value r.sub.t+1 and the new state of the system s.sub.t+1 after the implementation of the action u.sub.t. They are called P.sub.t, t+1 in the figure. It also comprises a memory that stores all of the inputs P.sub.t, t+1 at each time t. Given that the memory cannot be unlimited, the oldest P.sub.t−n, t−n+1 information is deleted as the memory of the computer 20 becomes full by means of a First in First Out (FIFO) method.

    [0052] The implementation and updating of the function Q for estimating the thermal performance of the system are described in greater detail below.

    [0053] In FIG. 3, a fourth block 25 is used to show that the modified state of the system s.sub.t+1 then becomes the new current state of the system s.sub.t, a time step having elapsed.

    [0054] With reference to FIG. 2, the optimization method implemented by the computer described above comprises the iterative implementation of the following steps.

    [0055] In a first step 110, the block 21 of the computer determines, during a sub-step 111, at least one action u.sub.t to control at least one actuator 50 by applying the control function in its given state, based on the given state s.sub.t of the thermal system, and implements said action in a sub-step 113.

    [0056] During a second step 120, the block 22 of the computer determines a modified state of the thermal system after the implementation of said action u.sub.t.

    [0057] During a third step 130, the block 22 calculates a reward value based on the state transition of the thermal system observed from the state s.sub.t to the modified state s.sub.t+1, and said action u.sub.t. This calculation is implemented by a thermodynamic reward function.

    [0058] Advantageously, the thermodynamic reward function of the motor is configured to assign high reward values to the actions in which the thermal system of the vehicle motor 1 optimizes its thermal performance.

    [0059] In one advantageous embodiment, the thermodynamic reward function of the motor is configured to maximize the reward value when, based on a given state s.sub.t, the generation of thermodynamic irreversibilities brought about by the action u.sub.t is minimized. In other words, the thermodynamic reward function of the motor is also configured to minimize the destruction of exergy, that is, useful thermodynamic energy, which makes it possible in particular to minimize the electricity input on the part of the motor.

    [0060] Advantageously, the thermodynamic reward function is also configured to penalize the reward value when the control action u.sub.t causes the temperature of the motor to exceed a predetermined threshold. For example, if the motor must not exceed a maximum operating temperature of 70 degrees in order not to impair its thermal performance, then the thermal reward function of the motor is configured to penalize the reward value associated with an action if this action leads to this maximum temperature being exceeded.

    [0061] Advantageously, the thermodynamic reward function is also configured to penalize the reward value when a control action u.sub.t is implemented while the ambient temperature is greater than the motor temperature.

    [0062] In some embodiments, the thermodynamic reward function can also penalize a reward value when the corresponding control action is implemented while the motor is off.

    [0063] According to one exemplary embodiment, the thermodynamic reward function, which makes it possible to calculate the reward values, is defined as follows:

    [00002] r = - ( dSirr + 1 1 + e k 0 ( BSG time - 1 ) + k 1 .Math. ln ( 1 + e k 2 .Math. ( Tsys - Tmax ) ) + k 3 1 + e k 4 .Math. ( Tsys - Tmax ) + 2 1 + e k 5 .Math. ( Tsys - Tamb ) ) [ Math . 1 ]

    where r is the reward,
    dS.sub.irr is the generation of thermodynamic irreversibilities created by the transformation of the system,
    BSG time is a time before the motor is switched off,
    T.sub.sys is the temperature of the thermal system,
    T.sub.max is the maximum temperature of the motor before its thermal performance is reduced, and
    T.sub.amb is the ambient temperature around the vehicle.

    [0064] The determination the thermodynamic irreversibilities created by the transformation of the system depends of course on the system. By way of non-limiting examples, these thermodynamic irreversibilities can be calculated, in the case of the cooling of an electric motor to the temperature T.sub.m by natural convection with air at a temperature T.sub.a, by dS.sub.i=Q.sub.exch*(1/T.sub.a−1/T.sub.m), where Q.sub.exch is the quantity of heat transferred. If this cooling is forced by using a cooling circuit provided with a pump, the thermodynamic irreversibilities include an additional term representing the dissipation of the pumping energy in the form of pressure loss: dS.sub.i=dS.sub.i=Q.sub.exch*(1/T.sub.a−1/T.sub.m)+A(P.sub.in−P.sub.out)/T.sub.a, where P.sub.in and P.sub.out are the pressures upstream and downstream of the pump respectively, and A is an experimental coefficient.

    [0065] During a fourth step 140, the block 23 associates the reward value r.sub.t+1 with said action u.sub.t and the state transition observed from the given state s.sub.t to the modified state s.sub.t+1, and stores this association P.sub.t, t+1 in a memory.

    [0066] During a fifth step 150, the block 23 updates the function for estimating the thermal performance. To do this, the block 23 firstly estimates the thermal performance of the system in the current state, and to this end it firstly implements the function Q in the state t, based on the action u.sub.t implemented for the state s.sub.t, and knowing the control function implemented by the block 21 in its current state. The function Q calculates an expectation of the sum of the future rewards that can be obtained based on the current state s.sub.t of the system, and knowing the function n, and depreciated by a depreciation factor γ of between 0 and 1, described in greater detail below. This function can be calculated according to equation (1) of the publication cited above, or recursively by the Bellman equation referenced in equation (2) of said publication.

    [0067] The updating of the function Q is then implemented by modifying the parameters of this function (that is, the weighting factors of the neural network implementing this function) so as to maximize the accuracy of this function for estimating the thermal performance of the system. Advantageously, this update is carried out by minimizing the difference between the implementation of the function Q calculated for the state s.sub.t, the action u.sub.t, and the current state of the control function π, and a function yt defined by:


    y.sub.t=r.sub.t+1±γQ(s.sub.t+1,μ(s.sub.t+1)|θ.sup.Q)  [Math. 2]

    [0068] Where the function μ is defined by the control function π applied to the state s.sub.t+1, that is, the action u.sub.t+1. θ.sup.Q denotes the parameters of the function Q, that is, the matrix of weighting factors of the neural network implementing this function, and γ is the depreciation factor of between 0 and 1. y.sub.t is therefore the reward r.sub.t+1 at time t+1 plus the expectation of the sum of the rewards depreciated based on the state s.sub.t+1.

    [0069] Advantageously, this update is implemented by bootstrap by randomly taking a sub-set of N transitions P.sub.i, i+1 stored in the memory, and calculating a y.sub.i for each of these transitions, and minimizing the quadratic error L provided by:

    [00003] L = 1 N .Math. i ( y i - Q ( s i , u i .Math. θ Q ) ) 2 [ Math . 3 ]

    [0070] According to one variant embodiment, the so-called n-step return method is used, described for example in the publication Distributed Distributional Deterministic Policy Gradients (Hoffman et al, 2018), in order to take into account, in the error L, the n transitions following each transition used in the calculation of the error described above. In this case, the function y.sub.t becomes:

    [00004] y t = .Math. j = 0 n - 1 γ j r t + j + γ n Q ( s t + n , μ ( s t + n ) .Math. θ Q ) [ Math . 4 ]

    [0071] The expression of the function for estimating the thermal performance can then be reduced to:

    [00005] Q π ( s t , u t ) = r t + E [ .Math. j = 1 n - 1 γ j r t + j + γ n Q π ( s t + n , μ ( s t + n ) ) .Math. s t , u t ] [ Math . 5 ]

    [0072] Depending on the value of the depreciation factor γ, the rewards subsequent to the function rt at a time t are taken into account to a greater or lesser extent in the calculation of the new function Q.

    [0073] In one embodiment, the output of the neural network implemented by the block 23 is a scalar corresponding to the result of the function Q. Advantageously, a layer is added to the neural network before the output of the scalar resulting from the function Q making it possible to estimate the distribution of Q, in reality making it possible to calculate the expectation of Q using the Categorical method disclosed by the publication Distributed Distributional Deterministic Policy Gradients (Hoffman et al, 2018). Categorical allows the learning algorithm to converge more quickly and to be more efficient in the thermal control of the system.

    [0074] In one embodiment, the depreciation factor γ is between 0.80 and 1 inclusive. Advantageously, it is between 0.97 and 0.99 inclusive. As thermal systems have high inertia, learning is more efficient by taking the subsequent thermal performance heavily into account for calculating the new function for estimating the thermal performance.

    [0075] Likewise, still for considering thermal inertia, the gap between two time steps of the learning algorithm is important. For example, the time step can be between 0.1 and 2 seconds inclusive. A compromise must be found between too small a time step that does not consider the thermal inertia of the system, and too great a time step that does not allow the learning algorithm to converge. The determination of the number of return steps n in the so-called n-step return method is advantageously suitable for allowing satisfactory convergence of the learning algorithm. If there are too many values in too short a time frame, the learning algorithm cannot converge. For example, a value of n of between 3 and 10 inclusive can be applied, to cover a total time frame (corresponding to n times the time step) of between 1 and 6 seconds inclusive. For example, the time step selected can be 0.5 seconds, and n can be 4. The value of the time step and the value n of the n-step return are therefore correlated, in order to ensure the convergence of the algorithm.

    [0076] Finally, during a step 160, the block 23 updates the control function of the block 21 of the computer based on the function for estimating the thermal performance. This step is implemented by the descent of the gradient of J, J being the expected value of the initial thermal performance of the system, which depends on the parameters of the control function π, and is defined by:

    [00006] J = �� r i , s i ~ E , a i ~ π [ R 1 ] [ Math . 6 ] Where R t = .Math. i = t T γ ( i - t ) r ( s i , u i ) [ Math . 7 ]

    [0077] And where E is the environment.

    [0078] The gradient of J is defined in equation (6) of the publication by Lillicrap et al cited above. According to the expression of the gradient of J, this descent of the gradient allows the parameters of the control function to be updated, in this case the matrix of the weighting factors of the neural network implemented by the block 21, so as to maximize the expected thermal performance.

    [0079] In one embodiment, step 110 includes an additional sub-step 112 of adding an exploration noise to the control action u.sub.t determined by the computer 20 during sub-step 111, or directly to the parameters of the control function.

    [0080] Compared with step 110 without this sub-step, the exploration noise added makes it possible to obtain improved efficiency of the thermal system by implementing exploratory training that allows it to learn any actions that further optimize the thermal system. The addition of the exploration noise is shown in FIG. 3 by a block 24 of the computer. In one embodiment, the exploration noise can be Gaussian white noise or a noise generated by an Ornstein Uhlenbeck process.

    [0081] In one embodiment, once the control function has been trained using the method described above, the control function can be stored in a memory and then implemented directly by a second computer, separate from the first, this second computer being embedded in the vehicle. In this case, the second computer determines a control action as a function of a state of the thermal system using the control function previously trained.

    [0082] Said control function can no longer be updated however, as the computer does not contain the training method in its memory.

    [0083] In another embodiment, the computer 20 that implements the training of the control function can be a computer built into the target vehicle, and continue to update the control function as the vehicle is used.

    [0084] The proposed invention therefore makes it possible to carry out optimized control of the thermal system of a vehicle motor without taking into account the complexity of modeling the system for a lower inference calculation cost and also for a lower economic cost. In addition, it can be adapted to a large number of vehicle motor thermal systems, which makes the invention flexible.