Method and Device for Training an Energy Management System in an On-Board Energy Supply System Simulation

20220391700 · 2022-12-08

    Inventors

    Cpc classification

    International classification

    Abstract

    A method and device for training an energy management system in an on-board energy supply system simulation, includes: simulating a driving cycle having defined recuperation; plotting state variables of the on-board energy supply system; calculating a recuperation power from a recu-peration current and a battery voltage; producing input vectors for a neural network; producing a reward function; and training the neural network.

    Claims

    1.-10. (canceled)

    11. A method for training an energy management system in a simulation of an on-board energy system of a motor vehicle, comprising: simulating a driving cycle with defined recuperation; recording state variables of the on-board energy system; calculating a recuperation power P.sub.recu from a recuperation current I.sub.recu and a battery voltage U.sub.bat in accordance with the following formula:
    P.sub.recu=U.sub.bat.Math.I.sub.recu; generating input vectors of a neural network; generating a reward function; and training the neural network.

    12. The method according to claim 11, wherein determining the recuperation current I.sub.recu comprises: extracting all grid points of a battery current profile I.sub.bat that are able to be attributed to decisions of the energy management system and have not been impressed externally on the on-board energy system; smoothing the battery current profile I.sub.bat between remaining grid points (120); approximating the battery current profile I.sub.bat through an approximated battery current profile I.sub.approx between the remaining grid points; and calculating the recuperation current I.sub.recu from the battery current I.sub.bat and the approximated battery current I.sub.approx in accordance with the following formula:
    I.sub.recu=I.sub.bat−I.sub.approx.

    13. The method according to claim 11, wherein the recuperation current I.sub.recu corresponds to the battery current I.sub.bat.

    14. The method according to claim 11, wherein generating the input vectors S of the neural network comprises: generating a state input vector S.sub.normal of a neural network that has the following form: S normal = [ Generator degree of use Normalized battery current SoC Battery temperature ] expanding a state input vector S.sub.normal of the neural network with a state vector S.sub.expanded, such that an overall vector S has the following form: S = [ S normal S expanded ] .

    15. The method according to claim 14, wherein generating the state vector S.sub.expanded comprises: calculating recuperation energy values E.sub.recu,x by integrating a recuperation power P.sub.recu(t) over time t, from a current time to within the driving cycle to a time t.sub.0+x.Math.t.sub.vs, wherein x is a percentage share of a look-ahead time t.sub.vs for a limited future consideration of recuperation powers P.sub.recu(t), in accordance with the following integral: E recu x ( t 0 ) = t 0 t 0 + x . t vs P recu dt generating a state vector S.sub.expanded that comprises at least the recuperation energy values E.sub.recu,25%, E.sub.recu,50%, E.sub.recu,75% and E.sub.recu,100% and has the following form: S expanded = [ E recu , 25 % E recu , 50 % E recu , 75 % E recu , 100 % ] .

    16. The method according to claim 14, wherein generating the state vector S.sub.expanded comprises: calculating a center of gravity t.sub.sp of a power distribution and a predicted recuperation energy value E.sub.recu,100% within a look-ahead time t.sub.vs, wherein the center of gravity is that point at which the integral over the recuperation power within the look-ahead time t.sub.vs takes on half the overall recuperation energy in accordance with the following equation:
    ∫.sub.t.sub.0.sup.t.sup.0.sup.t.sup.spP.sub.recu(t)dt=∫.sub.t.sub.0.sub.+t.sub.sp.sup.t.sup.0.sup.+t.sup.vsP.sub.recu(t)dt generating a state vector S.sub.expanded that comprises the predicted recuperation energy value E.sub.recu,100% and the center of gravity t.sub.sp of the power distribution and has the following form: S expanded = [ E recu , 100 % t sp ] .

    17. The method according to claim 14, wherein generating the state vector S.sub.expanded comprises: calculating a weighted recuperation energy value E.sub.recu,weighted by integrating a recuperation power P.sub.recu(t) over time t from a current time to within the driving cycle to the end of the driving cycle t.sub.end, wherein the recuperation power P.sub.recu(t) is temporally weighted with a weighting factor α(t), in accordance with the following integral: E recu , weighted ( t 0 ) = t 0 t end α ( t ) .Math. P recu ( t ) dt generating a state vector S.sub.expanded that comprises the weighted recuperation energy value E.sub.recu,weighted, and has the following form:
    S.sub.expanded=[E.sub.recu,weighted.

    18. The method according to claim 11, wherein the reward function adopts a positive value when the battery state of charge: (i) is improved and does not exceed a permissible range, and (ii) a predicted recuperation energy is able to be stored without the permissible range of the battery state of charge being exceeded in the process, and (iii) a reflex has not intervened.

    19. The method according to claim 11, wherein the neural network is trained in accordance with a Q-learning algorithm.

    20. A device for training an energy management system in a simulation of an on-board energy supply system of a motor vehicle, comprising: a processor and associated memory configured to: simulate a driving cycle with defined recuperation; record state variables of the on-board energy system; calculate a recuperation power P.sub.recu from a recuperation current I.sub.recu and a battery voltage U.sub.bat in accordance with the following formula:
    P.sub.recu=U.sub.bat.Math.I.sub.recu; generate input vectors of a neural network; generate a reward function; and train the neural network.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0029] FIG. 1 shows one exemplary embodiment of a method for calculating a recuperation power in an on-board energy system simulation;

    [0030] FIG. 2 shows one exemplary embodiment of a method for integrating a prediction of recuperation in an energy management system; and

    [0031] FIG. 3 shows one exemplary embodiment of a reflex-augmented reinforcement learning method in an on-board energy system simulation.

    DETAILED DESCRIPTION OF THE DRAWINGS

    [0032] FIG. 1 shows one exemplary embodiment of a method 100 for calculating a recuperation power P.sub.recu in an on-board energy system simulation.

    [0033] The input variables are the generator state S.sub.gen, the battery current I.sub.bat and the battery voltage U.sub.bat. In a method step 110, grid points of the battery current profile that are influenced by the operating strategy of the energy management system are identified and extracted. Further grid point peaks are removed in method step 120 in order to smooth the battery current profile. Next, in method step 130, the battery current profile is approximated with the remaining grid points. Using the approximated battery current profile I.sub.approx, the recuperation current I.sub.recu is calculated in accordance with I.sub.recu=I.sub.bat−I.sub.approx and the recuperation power P.sub.recu is calculated in accordance with P.sub.recu=U.sub.bat.Math.I.sub.recu.

    [0034] FIG. 2 shows one exemplary embodiment of a method 200 for integrating a prediction of recuperation in an energy management system.

    [0035] A prediction of recuperation 300 may be determined from sensor data 240 from the on-board system 400 and from route data from a route database and be transmitted to the energy management system 250. This is capable of making strategic decisions on the basis of system state data 220 and a prediction of recuperation 230, for example through reinforcement learning.

    [0036] FIG. 3 shows one exemplary embodiment of a reflex-augmented reinforcement learning method 500 in an on-board energy system simulation.

    [0037] A reflex 600 stabilizes and secures the energy management system by checking and potentially modifying all actions 550 proposed by a learning agent 510. Only an action 650 accepted and potentially modified by the reflex 600 is able to directly influence the state of an on-board energy system 700. The learning agent 510 then receives feedback as to how the action 550 proposed thereby has affected the on-board energy system, in the form of a reward 610, in accordance with a reward function. The operating strategy is thereby oriented to desired optimization targets on the basis of a system state 710 during a learning process. Intervention of the reflex 600 is taken into consideration in the reward function

    [0038] One exemplary embodiment for the development of a suitable reward function for training an energy management system is shown by the following algorithm.

    TABLE-US-00001 IF reflex has intervened THEN     R = 0 ELSE  IF SOC > SOC_crit_max OR SOC < SOC_crit_min THEN   IF SOC < SOC_crit_min THEN    IF charge battery THEN     R > 0    ELSE     R = 0   IF SOC > SOC_crit_max THEN    IF discharge battery THEN     R > 0    ELSE     R = 0  ELSE   IF SOC > SOC_target + Delta    IF battery discharge THEN     R > 0    ELSE     R = 0   IF SOC < SOC_target − Delta    IF battery charge THEN     R > 0    ELSE     R = 0  IF SOC_target − Delta < SOC <SOC_target + Delta THEN   IF expected recuperation energy > E_threshold value THEN    IF battery discharge THEN     R > 0    ELSE     R = 0   ELSE    IF keep battery SOC THEN     R > 0    ELSE     R = 0

    [0039] In this case, the constant Delta denotes a deviation of the state of charge SOC from a desired target value. The deviation may for example be 2%. SOC denotes a current state of charge, and SOC target denotes a desired optimum state of charge. This may for example be 80% of the maximum state of charge.

    [0040] The constant E threshold value may be calculated as follows:


    SOC+SOC_through_recu=SOC_target+Delta


    SOC_through_recu=SOC_target−SOC+Delta [0041] SOC: Current SOC value [0042] SOC_through_recu: SOC increase caused by Recu [0043] SOC_target: Target SOC, for example 80% [0044] Delta: Delta how far the SOC is allowed to deviate from the target SOC

    [0045] This means that the battery, in the case of expected recuperation energy, should only be discharged if the required SOC range (SOC_target−Delta<SOC<SOC_target+Delta) would be otherwise be exceeded without discharging.


    E_threshold value=SOC_through_recu*Q_battery*U_batt_average [0046] E_threshold value: Energy threshold value [0047] Q_battery: Nominal capacity of the battery [0048] U_batt_average: Average battery voltage across the cycle