Deep reinforcement learning for air handling and fuel system referencing
10746123 ยท 2020-08-18
Assignee
Inventors
- Kartavya Neema (Columbus, IN, US)
- Vikas Narang (Columbus, IN, US)
- Govindarajan Kothandaraman (Columbus, IN, US)
- Shashank Tamaskar (West Lafayette, IN, US)
Cpc classification
F02D41/28
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/2441
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/2438
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/1406
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/1479
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D2041/281
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/2451
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/2454
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D2200/0406
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/18
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/2445
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/0062
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
Y02T10/40
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
F02D41/1441
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
International classification
F02D41/24
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/14
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/18
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/28
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
F02D41/00
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
Abstract
An engine system includes an air handling and fuel system whose states are managed by a reference managing unit. The engine system has a plurality of sensors whose sensor signals at least partially define a current state of the engine system. The reference managing unit includes a controller which controls the air handling and fuel system of the engine system as well as a processing unit coupled to the sensors and the controller. The processing unit includes an agent which learns a policy function that is trained to process the current state, determines air handling references and fuel system references by using the policy function after receiving the current state as an input, and outputs the air handling references and fuel system references to the controller. Then, the agent receives a next state and a reward value from the processing unit and updates the policy function using a policy evaluation algorithm and a policy improvement algorithm based on the received reward value. Subsequently, the controller controls the air handling and fuel system of the engine in response to receiving the air handling references and the fuel system references.
Claims
1. A reference managing unit for air handling and fuel system of an engine system, comprising: a plurality of sensors coupled to the engine system, wherein sensor signals from the plurality of sensors at least partially define a current state; a controller operable to control the air handling and fuel system of the engine system; and a processing unit coupled to the sensors and the controller, the processing unit comprising an agent operative to: learn a policy function trained to process the current state, determine air handling references and fuel system references by using the policy function after receiving the current state as an input, output the air handling references and fuel system references to the controller, receive a next state and a reward value comprises a weighted summation of at least two of: a smoke value, an emission value, a torque response value, or fuel concentration in the engine, from the processing unit, and update the policy function using a policy evaluation algorithm and a policy improvement algorithm based on the reward value, wherein the controller controls the air handling and fuel system of the engine in response to the air handling references and fuel system references.
2. The reference managing unit of claim 1, wherein the current state comprises one or more of: a speed value, a load value, air handling states, and fuel system states of the engine system.
3. The reference managing unit of claim 2, wherein: the air handling states comprise one or more of: a charge flow value, exhaust gas recirculation (EGR) fraction values, EGR flow commands, a fresh air flow command, and an intake manifold pressure command; and the fuel system states comprise one or more of: fuel concentration values, a rail pressure value, and start of injection (SOI) values.
4. The reference managing unit of claim 1, wherein: the air handling references comprise one or more of: a charge flow command, EGR fraction commands, EGR flow commands, a fresh air flow command, and an intake manifold pressure command; and the fuel system references comprise one or more of: a fueling command, a rail pressure command, and a SOI command.
5. The reference managing unit of claim 1, wherein the agent comprises a plurality of function approximators, the function approximators comprising one or more of: deep neural networks, support vector machines (SVM), regression based methods, and decision trees.
6. The reference managing unit of claim 5, wherein the deep neural networks comprise one or more of: long short-term memory (LSTM) networks and convolution neural networks.
7. The reference managing unit of claim 5, wherein the plurality of deep neural networks are trained with (a) steady state data and (b) transient state data of the engine system, by using an optimization technique comprising one or more of: q-learning and policy gradients.
8. A method for managing air handling and fuel system references for an engine, comprising: learning a policy function trained to process a current state; determining air handling references and fuel system references by using the policy function after receiving the current state as an input; receiving a next state and a reward value comprising a weighted summation of at least two of: a smoke value, an emission value, a torque response value, or fuel concentration in the engine; updating the policy function using a policy evaluation algorithm and a policy improvement algorithm based on the reward value; and controlling air handling and fuel system of the engine in response to the determined air handling references and fuel system references.
9. The method of claim 8, wherein the current state comprises one or more of: a speed value, a load value, air handling states, and fuel system states of the engine system.
10. The method of claim 9, wherein: the air handling states comprise one or more of: a charge flow value, exhaust gas recirculation (EGR) fraction values, EGR flow commands, a fresh air flow command, and an intake manifold pressure command; and the fuel system states comprise one or more of: a fuel concentration value, a rail pressure value, and start of injection (SOI) values.
11. The method of claim 8, wherein: the air handling references comprise one or more of: a charge flow command, EGR fraction commands, EGR flow commands, a fresh air flow command, and an intake manifold pressure command; and the fuel system references comprise one or more of: a fueling command, a rail pressure command, and a SOI command.
12. The method of claim 8, wherein the agent comprises a plurality of function approximators, the function approximators comprising one or more of: deep neural networks, support vector machines (SVM), regression based methods, and decision trees.
13. The method of claim 12, wherein the deep neural networks comprise one or more of: long short-term memory (LSTM) networks and convolution neural networks.
14. The method of claim 12, wherein the plurality of deep neural networks are trained with (a) steady state data and (b) transient state data of the engine system by using an optimization technique comprising one or more of: q-learning and policy gradients.
15. An internal combustion engine system, comprising: an internal combustion engine; at least one exhaust gas recirculation actuator; at least one turbocharger; an exhaust conduit; a plurality of sensors coupled to the at least one exhaust gas recirculation actuator, the internal combustion engine, and the exhaust conduit, the sensor signals from the plurality of sensors at least partially defining a current air handling states and current combustion states; a controller operable to control the at least one exhaust gas recirculation actuator and the at least one turbocharger; and a reference managing unit for air handling and combustion coupled to the sensors and the controller, the reference managing unit comprising a plurality of function approximators trained with (a) steady state data and (b) transient state data of the engine system by using an optimization technique, wherein the plurality of deep neural networks are operative to interact with the engine system to: learn a policy function trained to process the current air handling state and the current combustion state, determine air handling references and combustion references by using the policy function after receiving the current air handling state and the current combustion state as an input, output the air handling references and combustion references to the controller, receive a next state and a reward value comprising a weighted summation of at least two of: a smoke value, an emission value, a torque response value, or fuel concentration in the engine from the processing unit, and update the policy function using a policy evaluation algorithm and a policy improvement algorithm based on the reward value, wherein the controller controls the at least one exhaust gas recirculation actuator and the at least one turbocharger in response to the air handling references and combustion references.
16. The internal combustion engine system of claim 15, wherein the current state comprises a speed value, a load value, air handling states, and combustion states of the engine system.
17. The reference managing unit of claim 15, wherein: the air handling states comprise one or more of: a charge flow value, exhaust gas recirculation (EGR) fraction values, EGR flow commands, a fresh air flow command, and an intake manifold pressure command; the combustion states comprise one or more of: fuel concentration values, a rail pressure value, and start of injection (SOI) values, the air handling references comprise one or more of: a charge flow command, EGR fraction commands, EGR flow commands, a fresh air flow command, and an intake manifold pressure command, and the combustion references comprise one or more of: a fueling command, a rail pressure command, and a SOI command.
18. The reference managing unit of claim 15, wherein the function approximators comprise one or more of: deep neural networks, support vector machines (SVM), regression based methods, and decision trees.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The embodiments will be more readily understood in view of the following description when accompanied by the below figures and wherein like reference numerals represent like elements. These depicted embodiments are to be understood as illustrative of the disclosure and not as limiting in any way.
(2)
(3)
(4)
(5)
(6)
(7) While the present disclosure is amenable to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and are described in detail below. The intention, however, is not to limit the present disclosure to the particular embodiments described. On the contrary, the present disclosure is intended to cover all modifications, equivalents, and alternatives falling within the scope of the present disclosure as defined by the appended claims.
DETAILED DESCRIPTION OF THE DISCLOSURE
(8) In the following detailed description, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the present disclosure is practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present disclosure, and it is to be understood that other embodiments can be utilized and that structural changes can be made without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims and their equivalents.
(9) Reference throughout this specification to one embodiment, an embodiment, or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Appearances of the phrases in one embodiment, in an embodiment, and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment. Similarly, the use of the term implementation means an implementation having a particular feature, structure, or characteristic described in connection with one or more embodiments of the present disclosure, however, absent an express correlation to indicate otherwise, an implementation may be associated with one or more embodiments. Furthermore, the described features, structures, or characteristics of the subject matter described herein may be combined in any suitable manner in one or more embodiments.
(10)
(11)
(12) Furthermore, the engine system 300 incorporates a high-pressure EGR system in which the EGR actuator 308 recirculates exhaust gas between the two high-pressure points, i.e. the exhaust manifold and the inlet manifold. In another embodiment shown in
(13) Activation of the EGR actuator 308, fuel system 306, and turbine 322 help to increase the speed of the engine, but they must be controlled to achieve optimal efficiency within the system. In other words, it is desirable for the engine to maintain some of these components in a deactivated state when there is no need for an increase in the speed of the engine, such as, if the engine system is incorporated into a car, when the user is driving on a road with a lower speed limit than a freeway or the driving style of the user indicates that he or she tends to drive at a more moderate speed. As such, a current state of the engine system may be used in determining whether such activation is necessary. In the reinforcement learning technique of
(14) Measurements from the sensors 340, 342, 344, and 346 are sent as sensor signals 341, 343, 345, and 347, respectively, to the processing unit 326 which uses these data to determine the next actions to be taken by the controller 328. The processing unit 326 includes the agent 202 from
(15) The function approximators act to approximate how the engine behaves under different conditions using a reinforcement learning technique as explained in
(16) In one example, the states of the internal combustion engine system include one or more of: the engine speed, the engine load (torque output of the engine), the air handling states, and the combustion states. The air handling states include one or more of: the charge flow of the engine (the sum of air flow into the intake manifold of the engine) and the EGR fraction (the fraction of charge flow attributable to recirculated exhaust gas from the engine). Additionally, the air handling states also include one or more of: prior EGR flow commands, fresh air flow command, and intake manifold pressure command as previously sent by the controller. The fuel system states include one or more of: the fuel concentration in the engine, the rail pressure in the fuel injection system, and the start-of-injection (SOI), or injection timing of the fuel into the engine. Furthermore, although
(17)
(18) A detailed explanation of the reinforcement learning technique is described below in view of the engine systems illustrated in
(19) The agent 202 has a policy which is the starting function for the learning technique. A policy is a function which considers the current state x.sub.t to output a corresponding action u.sub.t, expressed as u.sub.t=(x.sub.t). As such, the policy determines the initial action u.sub.0 and sends a command signal 348 to the controller 328. The controller 328 then sends the appropriate command signals 350, 352, and 354 to the fuel system 306, the EGR actuator 308, and the turbine 322 of the turbocharger 310, respectively, based on the command signal 348 which includes the air handling references and the fuel system references. The air handling references can include commands regarding the charge flow, EGR fraction, the EGR flow, the fresh air flow, and the intake manifold pressure. For example, the air handling references decide how much air should be brought into the system and how fast should this be done, as well as how much of the exhaust gas should be recirculated back into the engine and how much pressure should be in the intake manifold. The fuel system references can include commands regarding the fueling, the rail pressure, and the SOI. For example, the fuel system references decide how much fuel needs to be inserted into the engine and at what speed, as well as the necessary pressure in the rail to achieve such fuel injection and the timing of the fuel injection.
(20) After the command signals are applied, the engine system (i.e. the environment) enters the next state, after which the sensors provide new measurements to the processing unit 326, which uses these updated sensor signals to calculate the new current state x.sub.1 of the environment and sends the data to the agent 202, along with a first reward r.sub.0 which is a scalar value. The processing unit 326 stores a program which calculates the reward, i.e. a reward function R such that r.sub.t=R(u.sub.t, x.sub.t, x.sub.t+1), to send to the agent 202. For example, the reward is an approximate function derived from the smoke value and its surrogates (for example the air-to-fuel ratio and the in-cylinder oxygen content), the emission value (calculated using for example the NO.sub.x value and the particulate matter value as measured by the sensors 346 connected to the exhaust conduit 338) a torque response value of the engine, and the fueling amount from the fuel system 306. In another example, the reward is a weighted summation of the above parameters as outputted by the engine system, such that more weight can be placed on some features than others. Once the agent 202 receives the first reward r.sub.0, the agent 202 determines the next action u.sub.1 by using the policy based on the current state x.sub.1, i.e. u.sub.1=(x.sub.1).
(21) To evaluate the quality of the policy , a value function V is calculated such that
V((x.sub.N))=.sub.t=0.sup.N(.sup.tr.sub.t)(1)
for a time horizon from t=0 to t=N. When the N value approaches infinity (i.e. the system runs for a prolonged period of time) the value function V can represent a receding horizon problem, which is useful in understanding the global stability properties of any local optimization that is determined by the policy. In the function, is the discount factor between 0 and 1 which denotes how much weight is placed on future rewards in comparison with the immediate rewards. The discount factor is necessary to make the sum of rewards converge, and this denotes that future rewards are preferred at a discounted rate with respect to the immediate reward. The policy must act in a way to increase as much reward gained as possible, therefore the goal of the agent 202 is to find a policy that maximizes a sum of the reward over the time horizon, i.e. max .sub.t=0.sup.N(.sup.tr.sub.t).
(22) The policy is also constantly improved using policy evaluation and policy improvement algorithms. During a policy evaluation process, the value function V() is calculated for some, or all, of the states x based on a fixed policy . Then, during a policy improvement process which follows, the policy is improved by using the value function V() obtained in the policy evaluation step such that a value function V() calculated using the new policy is greater than or equal to the value function V() calculated using the original policy . These two processes are repeated one after another until either (a) the policy remains unchanged, (b) the processes continue for more than a predetermined period of time, or (c) the change to the value function V is less than a predetermined threshold. In one embodiment, the agent is trained using the steady state data and the transient state data of the engine system. That is, the agent learns to start the engine system while the engine is in a steady state, i.e. not turned on, and during the transient state of the engine system, i.e. while the engine is running. By training the system in both settings, the engine system can start and control the air handling and fuel injection system effectively and efficiently.
(23) Numerous different approaches can be taken to achieve the goal of maximizing the sum of reward. Some of the approaches are model-based (an explicit model of the environment is estimated and an optimal policy is computed for the estimated model) and model-free (the optimal policy is learned without first learning an explicit model, such as value-function based learning that is related to the dynamic programming principles). One example of a model-free approach is an optimization technique known as q-learning. The q-learning technique develops and updates a map Q(x.sub.t, u.sub.t) which is similar to a value function that gives an estimate sum of rewards r.sub.t for a pair of a given state x.sub.t and action u.sub.t. This map is initialized with a starting value and successively updated by observing the reward using an update function, as explained below. The map function is described by the following equation:
Q(x.sub.t,u.sub.t)(1)Q(x.sub.t,u.sub.t)+(r.sub.t+ max.sub.uQ(x.sub.t+1,u))(2)
where Q(x.sub.t,u.sub.t) is the old value, a is a learning rate between 0 and 1, max.sub.uQ(u,x.sub.t+1) is the estimate of the optimal future value, and (r.sub.t+ max.sub.uQ(x.sub.t+1,u)) is the learned value. As such, the old value is replaced by a new value, which is the old value transformed using the learning rate and the learned value as shown in the equation (2). The q-learning is an off-policy value-based learning technique, in which the value of the optimal policy is learned independently of the agent's actions chosen for the next state, in contrast to on-policy learning techniques like policy gradient which can also be used as a learning technique for the engine system as described herein. Advantages of q-learning technique includes being more successful at finding a global optimum solution rather than just a local maximum.
(24) A policy gradient technique is a direct policy method which starts with learning a map from state to action, and adjusts weights of each action by using gradient descent with the feedback from the environment. For any expected return function J(), the policy gradient technique searches for a local maximum in J() so the expected return function
J()=E{.sub.k=0.sup.Ha.sub.kr.sub.k}(3)
is optimized where a.sub.k denotes time-step dependent weighting factors often set to a.sub.k=.sup.k for discounted reinforcement learning, by ascending the gradient of the policy with respect to the parameter , i.e.
=.sub.J()(4)
where .sub.J() is the policy gradient and a is a step-size parameter, the policy gradient being:
(25)
The policy gradient can then be calculated or approximated using methods such as finite difference methods and likelihood ratio methods. As such, the policy gradient technique guarantees that the system will converge to reach a local maximum for the expected returns. Furthermore, other model-free algorithms, such as SARSA (state-action-reward-state-action) algorithm, deep Q network (DQN) algorithm, deep deterministic policy gradient (DDPG) algorithm, trust region policy optimization (TRPO) algorithm, and proximal policy optimization (PPO) algorithm, can also be used.
(26) Because reinforcement learning shares a structure similar to a traditional control system, advantages of using such a technique includes the ability to capture non-linearity of the model to a high precision, resulting in improved performance. Current calibration techniques model engine behaviors in steady state and involve having technicians perform the calibration and optimization off-line and plug in the values for engine operation. Reinforcement learning can reduce such a calibration effort because the reinforcement learning technique can optimally meet all performance indexes for the engine system. Furthermore, the reinforcement learning utilizes on-line optimization with data collected in real conditions (such as when the engine is in operation) to calibrate and optimize the parameters within the engine, which allows the engine to adapt to changes in the operating condition without needing to recalibrate. As such, due to the adaptive nature of the reinforcement learning technique, even when the engine is running in a non-calibrated condition, the engine can learn or calibrate relevant parameters on its own to deliver similar levels of performance.
(27) Furthermore, the aforementioned air handling and fueling system can be used for other types of engines besides the internal combustion engines as described above. For example, the engine of a plug-in hybrid electric vehicles combines a gasoline or diesel engine with an electric motor with a rechargeable battery, such that the battery initially drives the car and the conventional engine takes over when the battery runs out. In such an instance, the air handling and fueling system can be programmed to activate after the vehicle switches its power source from the battery to the conventional engine. Also, electric vehicles which uses only electric motors or traction motors for propulsion can also include such air handling and fueling systems. In this case, the fueling system is replaced with the battery and the DC controller which delivers varying levels of power according to the potentiometer installed in the car, and the air handling system for the engine is replaced by a climate control system for the interior of the car, such as a power heating and air conditioning system, for example.
(28) The present subject matter may be embodied in other specific forms without departing from the scope of the present disclosure. The described embodiments are to be considered in all respects only as illustrative and not restrictive. Those skilled in the art will recognize that other implementations consistent with the disclosed embodiments are possible.