Method and Device for Training an Energy Management System in an On-Board Energy Supply System Simulation
20220391700 · 2022-12-08
Inventors
Cpc classification
H01M2010/4271
ELECTRICITY
B60R16/033
PERFORMING OPERATIONS; TRANSPORTING
H01M10/425
ELECTRICITY
H01M2220/20
ELECTRICITY
International classification
B60R16/033
PERFORMING OPERATIONS; TRANSPORTING
Abstract
A method and device for training an energy management system in an on-board energy supply system simulation, includes: simulating a driving cycle having defined recuperation; plotting state variables of the on-board energy supply system; calculating a recuperation power from a recu-peration current and a battery voltage; producing input vectors for a neural network; producing a reward function; and training the neural network.
Claims
1.-10. (canceled)
11. A method for training an energy management system in a simulation of an on-board energy system of a motor vehicle, comprising: simulating a driving cycle with defined recuperation; recording state variables of the on-board energy system; calculating a recuperation power P.sub.recu from a recuperation current I.sub.recu and a battery voltage U.sub.bat in accordance with the following formula:
P.sub.recu=U.sub.bat.Math.I.sub.recu; generating input vectors of a neural network; generating a reward function; and training the neural network.
12. The method according to claim 11, wherein determining the recuperation current I.sub.recu comprises: extracting all grid points of a battery current profile I.sub.bat that are able to be attributed to decisions of the energy management system and have not been impressed externally on the on-board energy system; smoothing the battery current profile I.sub.bat between remaining grid points (120); approximating the battery current profile I.sub.bat through an approximated battery current profile I.sub.approx between the remaining grid points; and calculating the recuperation current I.sub.recu from the battery current I.sub.bat and the approximated battery current I.sub.approx in accordance with the following formula:
I.sub.recu=I.sub.bat−I.sub.approx.
13. The method according to claim 11, wherein the recuperation current I.sub.recu corresponds to the battery current I.sub.bat.
14. The method according to claim 11, wherein generating the input vectors S of the neural network comprises: generating a state input vector S.sub.normal of a neural network that has the following form:
15. The method according to claim 14, wherein generating the state vector S.sub.expanded comprises: calculating recuperation energy values E.sub.recu,x by integrating a recuperation power P.sub.recu(t) over time t, from a current time to within the driving cycle to a time t.sub.0+x.Math.t.sub.vs, wherein x is a percentage share of a look-ahead time t.sub.vs for a limited future consideration of recuperation powers P.sub.recu(t), in accordance with the following integral:
16. The method according to claim 14, wherein generating the state vector S.sub.expanded comprises: calculating a center of gravity t.sub.sp of a power distribution and a predicted recuperation energy value E.sub.recu,100% within a look-ahead time t.sub.vs, wherein the center of gravity is that point at which the integral over the recuperation power within the look-ahead time t.sub.vs takes on half the overall recuperation energy in accordance with the following equation:
∫.sub.t.sub.
17. The method according to claim 14, wherein generating the state vector S.sub.expanded comprises: calculating a weighted recuperation energy value E.sub.recu,weighted by integrating a recuperation power P.sub.recu(t) over time t from a current time to within the driving cycle to the end of the driving cycle t.sub.end, wherein the recuperation power P.sub.recu(t) is temporally weighted with a weighting factor α(t), in accordance with the following integral:
S.sub.expanded=[E.sub.recu,weighted.
18. The method according to claim 11, wherein the reward function adopts a positive value when the battery state of charge: (i) is improved and does not exceed a permissible range, and (ii) a predicted recuperation energy is able to be stored without the permissible range of the battery state of charge being exceeded in the process, and (iii) a reflex has not intervened.
19. The method according to claim 11, wherein the neural network is trained in accordance with a Q-learning algorithm.
20. A device for training an energy management system in a simulation of an on-board energy supply system of a motor vehicle, comprising: a processor and associated memory configured to: simulate a driving cycle with defined recuperation; record state variables of the on-board energy system; calculate a recuperation power P.sub.recu from a recuperation current I.sub.recu and a battery voltage U.sub.bat in accordance with the following formula:
P.sub.recu=U.sub.bat.Math.I.sub.recu; generate input vectors of a neural network; generate a reward function; and train the neural network.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0029]
[0030]
[0031]
DETAILED DESCRIPTION OF THE DRAWINGS
[0032]
[0033] The input variables are the generator state S.sub.gen, the battery current I.sub.bat and the battery voltage U.sub.bat. In a method step 110, grid points of the battery current profile that are influenced by the operating strategy of the energy management system are identified and extracted. Further grid point peaks are removed in method step 120 in order to smooth the battery current profile. Next, in method step 130, the battery current profile is approximated with the remaining grid points. Using the approximated battery current profile I.sub.approx, the recuperation current I.sub.recu is calculated in accordance with I.sub.recu=I.sub.bat−I.sub.approx and the recuperation power P.sub.recu is calculated in accordance with P.sub.recu=U.sub.bat.Math.I.sub.recu.
[0034]
[0035] A prediction of recuperation 300 may be determined from sensor data 240 from the on-board system 400 and from route data from a route database and be transmitted to the energy management system 250. This is capable of making strategic decisions on the basis of system state data 220 and a prediction of recuperation 230, for example through reinforcement learning.
[0036]
[0037] A reflex 600 stabilizes and secures the energy management system by checking and potentially modifying all actions 550 proposed by a learning agent 510. Only an action 650 accepted and potentially modified by the reflex 600 is able to directly influence the state of an on-board energy system 700. The learning agent 510 then receives feedback as to how the action 550 proposed thereby has affected the on-board energy system, in the form of a reward 610, in accordance with a reward function. The operating strategy is thereby oriented to desired optimization targets on the basis of a system state 710 during a learning process. Intervention of the reflex 600 is taken into consideration in the reward function
[0038] One exemplary embodiment for the development of a suitable reward function for training an energy management system is shown by the following algorithm.
TABLE-US-00001 IF reflex has intervened THEN R = 0 ELSE IF SOC > SOC_crit_max OR SOC < SOC_crit_min THEN IF SOC < SOC_crit_min THEN IF charge battery THEN R > 0 ELSE R = 0 IF SOC > SOC_crit_max THEN IF discharge battery THEN R > 0 ELSE R = 0 ELSE IF SOC > SOC_target + Delta IF battery discharge THEN R > 0 ELSE R = 0 IF SOC < SOC_target − Delta IF battery charge THEN R > 0 ELSE R = 0 IF SOC_target − Delta < SOC <SOC_target + Delta THEN IF expected recuperation energy > E_threshold value THEN IF battery discharge THEN R > 0 ELSE R = 0 ELSE IF keep battery SOC THEN R > 0 ELSE R = 0
[0039] In this case, the constant Delta denotes a deviation of the state of charge SOC from a desired target value. The deviation may for example be 2%. SOC denotes a current state of charge, and SOC target denotes a desired optimum state of charge. This may for example be 80% of the maximum state of charge.
[0040] The constant E threshold value may be calculated as follows:
SOC+SOC_through_recu=SOC_target+Delta
SOC_through_recu=SOC_target−SOC+Delta [0041] SOC: Current SOC value [0042] SOC_through_recu: SOC increase caused by Recu [0043] SOC_target: Target SOC, for example 80% [0044] Delta: Delta how far the SOC is allowed to deviate from the target SOC
[0045] This means that the battery, in the case of expected recuperation energy, should only be discharged if the required SOC range (SOC_target−Delta<SOC<SOC_target+Delta) would be otherwise be exceeded without discharging.
E_threshold value=SOC_through_recu*Q_battery*U_batt_average [0046] E_threshold value: Energy threshold value [0047] Q_battery: Nominal capacity of the battery [0048] U_batt_average: Average battery voltage across the cycle