Apparatus and method for control with data-driven model adaptation
11840224 · 2023-12-12
Assignee
Inventors
Cpc classification
B60W30/16
PERFORMING OPERATIONS; TRANSPORTING
B60W30/0956
PERFORMING OPERATIONS; TRANSPORTING
G05B17/00
PHYSICS
G06V20/58
PHYSICS
International classification
B60W30/095
PERFORMING OPERATIONS; TRANSPORTING
B60W30/16
PERFORMING OPERATIONS; TRANSPORTING
G05B17/00
PHYSICS
Abstract
An apparatus for controlling an operation of a system is provided. The apparatus comprises an input interface configured to receive a state trajectory of the system, and a memory configured to store a model of dynamics of the system including a combination of at least one differential equation and a closure model. The apparatus further comprises a processor configured to update the closure model using reinforcement learning (RL) having a value function reducing a difference between a shape of the received state trajectory and a shape of state trajectory estimated using the model with the updated closure model, and determine a control command based on the model with the updated closure model. Further, the apparatus comprises an output interface configured to transmit the control command to an actuator of the system to control the operation of the system.
Claims
1. An apparatus for controlling an operation of a system, comprising: an input interface configured to receive a state trajectory of the system; a memory configured to store a model of dynamics of the system including a combination of at least one differential equation and a closure model; a processor configured to: update the closure model using reinforcement learning (RL) having a value function that reduces a difference between a shape of the received state trajectory and a shape of state trajectory estimated using the model with the updated closure model; and determine a control command based on the model with the updated closure model; and an output interface configured to transmit the control command to an actuator of the system to control the operation of the system.
2. The apparatus of claim 1, wherein the differential equation of the model defines a reduced order model of the system having a number of parameters less than a physical model of the system according to a partial differential equation (PDE), and wherein the reduced order model is an ordinary differential equation (ODE), wherein the updated closure model is a nonlinear function of a state of the system capturing a difference in behavior of the system according to the ODE and the PDE.
3. The apparatus of claim 2, wherein the partial differential equation (PDE) is a Boussinesq equation.
4. The apparatus of claim 1, wherein the processor is configured to initialize the closure model with a linear function of the state of the system and update the closure model iteratively with the RL until a termination condition is met.
5. The apparatus of claim 1, wherein the updated closure model includes a gain, wherein the processor is configured to determine the gain reducing an error between a state of the system estimated with the model having the updated closure model with the updated gain and an actual state of the system.
6. The apparatus of claim 5, wherein the actual state of the system is a measured state.
7. The apparatus of claim 5, wherein the actual state of the system is a state estimated with a partial differential equation (PDE) describing dynamics of the system.
8. The apparatus of claim 5, wherein the processor updates the gain using an extremum seeking.
9. The apparatus of claim 5, wherein the processor updates the gain using a Gaussian process-based optimization.
10. The apparatus of claim 1, wherein the operation of the system is subject to constraints, wherein the RL updates the closure model without considering the constraints, and wherein the processor determines the control command using the model with the updated closure model subject to the constraints.
11. The apparatus of claim 10, wherein the constraints include state constraints in continuous state space of the system and control input constraints in continuous control input space of the system.
12. The apparatus of claim 10, wherein the processor uses a predictive model based control to determine the control command while enforcing the constraints.
13. The apparatus of claim 11, wherein the system is a vehicle controlled to perform one or combination of a lane keeping, a cruise control, and an obstacle avoidance operation, wherein the state of the vehicle includes one or combination of a position, an orientation, and a longitudinal velocity, and a lateral velocity of the vehicle, wherein the control inputs include one or combination of a lateral acceleration, a longitudinal acceleration, a steering angle, an engine torque, and a brake torque, wherein the state constraints include one or combination of velocity constraints, lane keeping constraints, and obstacle avoidance constraints, and wherein the control input constraints include one or combination of steering angle constraints, and acceleration constraints.
14. The apparatus of claim 11, wherein the system is an induction motor controlled to perform a task, wherein the state of the motor includes one or combination of a stator flux, a line current, and a rotor speed, wherein the control inputs include values of excitation voltage, wherein the state constraints include constraints on values of one or combination of the stator flux, the line current, and the rotor speed, wherein the control input constraints include a constraint on the excitation voltage.
15. The apparatus of claim 1, wherein the system is an air-conditioning system generating airflow in a conditioned environment, wherein the model is a model of airflow dynamics connecting values of flow and temperature of air conditioned during the operation of the air-conditioning system.
16. The apparatus of claim 1, wherein the RL uses a neural network trained to minimize the value function.
17. A method for controlling an operation of a system, wherein the method uses a processor coupled to a memory storing a model of dynamics of the system including a combination of at least one differential equation and a closure model, the processor is coupled with stored instructions when executed by the processor carry out steps of the method, comprising: receiving a state trajectory of the system; updating the closure model using reinforcement learning (RL) having a value function that reduces a difference between a shape of the received state trajectory and a shape of state trajectory estimated using the model with the updated closure model; determining a control command based on the model with the updated closure model; and transmitting the control command to an actuator of the system to control the operation of the system.
18. The method of claim 17, wherein the differential equation of the model defines a reduced order model of the system having a number of parameters less than a physical model of the system according to a Boussinesq equation, wherein the Boussinesq equation is a partial differential equation (PDE), and wherein the reduced order model is an ordinary differential equation (ODE), wherein the updated closure model is a nonlinear function of a state of the system capturing a difference in behavior of the system according to the ODE and the PDE.
19. The method of claim 17, wherein the updated closure model includes a gain, wherein the method further comprising determining the gain reducing an error between a state of the system estimated with the model having the updated closure model with the updated gain and an actual state of the system.
20. The method of claim 17, wherein the operation of the system is subject to constraints, wherein the RL updates the closure model without considering the constraints, and wherein the method further comprising determining the control command using the model with the updated closure model subject to the constraints.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The presently disclosed embodiments will be further explained with reference to the attached drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
DETAILED DESCRIPTION
(20) In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced without these specific details. In other instances, apparatuses and methods are shown in block diagram form only in order to avoid obscuring the present disclosure.
(21) As used in this specification and claims, the terms “for example,” “for instance,” and “such as,” and the verbs “comprising,” “having,” “including,” and their other verb forms, when used in conjunction with a listing of one or more components or other items, are each to be construed as open ended, meaning that that the listing is not to be considered as excluding other, additional components or items. The term “based on” means at least partially based on. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.
(22)
(23) In some embodiments, the apparatus 100 uses model-based and/or optimization-based control and estimation techniques, such as model predictive control (MPC), to develop the control commands 106 for the system 102. The model-based techniques can be advantageous for control of dynamic systems. For example, the MPC allows a model-based design framework in which the system 102 dynamics and constraints can directly be taken into account. The MPC develops the control commands 106, based on the model of the system 104. The model 104 of the system 102 refers to dynamics of the system 102 described using differential equations. In some embodiments, the model 104 is non-linear and can be difficult to design and/or difficult to use in real-time. For instance, even if the nonlinear model is exactly available, estimating the optimal control commands 106 are essentially a challenging task since a partial differential equation (PDE) describing the dynamics of the system 102, named Hamilton-Jacobi-Bellman (HJB) equation needs to be solved, which is computationally challenging.
(24) Some embodiments use data-driven control techniques to design the model 104. The data-driven techniques exploit operational data generated by the system 102 in order to construct feedback control policy that stabilizes the system 102. For instance, each state of the system 102 measured during the operation of the system 102 may be given as the feedback to control the system 102. In general, the use of operational data to design the control policies and/or commands 106 is called data-driven control. The objective of data-driven control is to design a control policy from data and to use the data-driven control policy to control a system. In contrast with such data-driven control approaches, some embodiments use operational data to design a model, e.g., a model 104, of the control system and, then, to use the data-driven model to control the system using various model-based control methods. It should be noted, that the objective of some embodiments is to determined actual model of the system from data, i.e., such a model that can be used to estimate behavior of the system. For example, it is an object of some embodiments to determine the model of a system from data that capture dynamics of the system using differential equations. Additionally, or alternatively, it is an object of some embodiments to learn from data the model having physics-based PDE model accuracy.
(25) To simplify the computation, some embodiments formulate an ordinary differential equation (ODE) 108a to describe the dynamics of the system 102. In some embodiments, the ODE 108a may be formulated using model reduction techniques. For example, the ODE 108a may be reduced dimensions of the PDE. To that end, the ODE 108a can be a part of the PDE. However, in some embodiments, the ODE 108a fails to reproduce actual dynamics (i.e. the dynamics described by the PDE) of the system 102, in cases of uncertainty conditions. Examples of the uncertainty conditions may be the case where boundary conditions of the PDE are changing over a time or the case where one of coefficients involved in the PDE are changing.
(26) To that end, some embodiments formulate a closure model 108b that reduces the PDE, while covering the cases of the uncertainty conditions. In some embodiments, the closure model 108b may be a nonlinear function of a state of the system 102 capturing a difference in behavior (for instance, the dynamics) of the system 102 according to the ODE and the PDE. The closure model 108b may be formulated using reinforcement learning (RL). In other words, the PDE model of the system 102 is approximated by a combination of ODE 108a and a closure model 108b, and the closure model 108b is learned from data using RL. In such a manner, the model approaching the accuracy of PDE is learned from data.
(27) In some embodiments, the RL learns a state trajectory of the system 102 that defines the behavior of the system 102, rather than learning individual states of the system 102. The state trajectory may be a sequence of states of the system 102. Some embodiments are based on realization that a model 108 comprising the ODE 108a and the closure model 108b reproduces a pattern of the behavior of the system 102, rather the actual behavior values (for instance, the states) of the system 102. The pattern of the behavior of the system 102 may represent a shape of the state trajectory, for instance, a series of states of the system as a function of time. The pattern of the behavior of the system 102 may also represent a high-level characteristic of the model, for example boundedness of its solutions over time, or decay of its solutions over time, however, it does not optimally reproduce the dynamics of the system.
(28) To that end, some embodiments determine a gain and include the gain in the closure model 108b to optimally reproduce the dynamics of the system 102. In some embodiments, the gain may be updated using optimization algorithms. The model 108 comprising the ODE 108a, the closure model 108b with the updated gain reproduces the dynamics of the system 102. Therefore, the model 108 optimally reproduces the dynamics of the system 102. Some embodiments are based on realization the model 108 comprises less number of parameters then the PDE. To that end, the model 108 is computationally less complex then the PDE that describes the physical model of the system 102. In some embodiments, the control polices 106 are determined using the model 108. The control policies 106 directly map the states of the system 102 to control commands to control the operations of the system 102. Therefore, the reduced model 108 is used to design control for the system 102 in efficient manner.
(29)
(30) The state trajectory 216 may be a plurality of states of the system 102 that defines an actual behavior of dynamics of the system 102. For instance, the state trajectory 216 acts as a reference continuous state space for controlling the system 102. In some embodiments, the state trajectory 216 may be received from real-time measurements of parts of the system 102 states. In some other embodiments, the state trajectory 216 may be simulated using the PDE that describes the dynamics of the system 102. In some embodiments, a shape may be determined for the received state trajectory as a function of time. The shape of the state trajectory may represent an actual pattern of behavior of the system 102.
(31) The apparatus 200 further includes a processor 204 and a memory 206 that stores instructions that are executable by the processor 204. The processor 204 may be a single core processor, a multi-core processor, a computing cluster, or any number of other configurations. The memory 206 may include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory system. The processor 204 is connected through the bus 210 to one or more input and output devices. The stored instructions implement a method for controlling the operations of the system 102.
(32) The memory 206 may be further extended to include storage 208. The storage 208 may be configured to store a model 208a, a controller 208b, an updating module 208c, and a control command module 208d. In some embodiments, the model 208a may be the model describing the dynamics of the system 102, which includes a combination of at least one differential equation and a closure model. The differential equation of the model 208 may be the ordinary differential equation (ODE) 108a. The closure model of model 208a may be a linear function or a nonlinear function of the state of the system 102. The closure model may be learnt using the RL to mimic the behavior of the system 102. As should be understood, once the closure model is learnt the closure model may be the closure 108b as illustrated in
(33) The controller 208b may be configured to store instructions upon execution by the processor 204 executes one or more modules in the storage 208. Some embodiments are based on realization that the controller 208b administrates each module of the storage 208 to control the system 102.
(34) The updating module 208c may be configured to update the closure model of the model 208a using the reinforcement learning (RL) having a value function reducing a difference between the shape of the received state trajectory and a shape of state trajectory estimated using the model 208a with the updated closure model. In some embodiments, the updating module 208c may be configured to update the closure module iteratively with the RL until a termination condition is met. The updated closure model is the nonlinear function of the state of the system capturing a difference in behavior of the system according to the ODE and the PDE.
(35) Further, in some embodiments, the updating module 208c may be configured to update a gain for the updated closure model. To that end, some embodiments determines the gain reducing an error between the state of the system 102 estimated with the model 208a having the updated closure model with the updated gain and an actual state of the system. In some embodiments, the actual state of the system may be a measured state. In some other embodiments, the actual state of the system may be a state estimated with the PDE describing the dynamics of the system 102. In some embodiments, the updating module 208c may update the gain using an extremum seeking. In some other embodiments, the updating module 208c may update the gain using a Gaussian process-based optimization.
(36) The control command module 208c may be configured to determine a control command based on the model 208a with the updated closure model. The control command may control the operation of the system. In some embodiments, the operation of the system may be subject to constraints. To that end, the control command module 208c uses a predictive model based control to determine the control command while enforcing constraints. The constraints include state constraints in continuous state space of the system 102 and control input constraints in continuous control input space of the system 102.
(37) The output interface 218 is configured to transmit the control command to an actuator 220 of the system 102 to control the operation of the system. Some examples of the output interface 218 may include a control interface that submits the control command to control the system 102.
(38)
{right arrow over (u)}.sub.t=μΔ{right arrow over (u)}−({right arrow over (u)}.Math.∇){right arrow over (u)}−∇p−T
∇.Math.{right arrow over (u)}=0
T.sub.t=kΔT−u.Math.∇T
where T is a temperature scalar variable, {right arrow over (u)} is a velocity vector in three dimensions, μ is a viscosity and the reciprocal of the Reynolds number, k is a heat diffusion coefficient, and p is a pressure scalar variable.
(39) The operator Δ and ∇ are defined as:
(40)
(41) Some embodiments are based on realization that the physics-based high dimension model of the system 102 needs to be resolved to control the operations of the system 102 in real-time. For instance, in the case of the HVAC system, the Boussinesq equation needs to be resolved to control the airflow dynamics and the temperature in the room. Some embodiments are based on recognition that the physics-based high dimension model of the system 102 comprises a large number of equations and variables, which are complicated to resolve. For instance, a larger computation power is required to resolve the physics-based high dimension model in real-time. To that end, it is objective of some embodiments to simplify the physics-based high dimension model.
(42) At step 304, the apparatus 200 is provided to generate a reduced order model to reproduce the dynamics of the system 102 such that the apparatus 200 controls the system 102 in efficient manner. In some embodiments, the apparatus 200 may simplify the physics-based high dimension model using model reduction techniques to generate the reduced order model. Some embodiments are based on realization that the model reduction techniques reduce the dimensionality of the physics-based high dimension model (for instance, the variables of the PDE), such that the reduced order model may be used to in real-time for prediction and control of the system 102. Further, the generation of reduced order model for controlling the system 102 is explained in detail with reference to
(43)
(44) In all these scenarios, the model reduction techniques fail to have a unified approach to obtain the reduced order (or reduced dimension) model 406 of the dynamics of the system 102 covering all the above scenarios, i.e., parametric uncertainties as well as boundaries conditions uncertainties.
(45) It is objective of some embodiments to generate the ROM 406 that reduces the PDE in the cases of changing boundary conditions and/or changing parameters. To that end, some embodiments use adaptive model reduction method, regimes detection method and the like.
(46) For instance, in one embodiment of the invention the reduced order 406 has the quadratic form:
{dot over (x)}.sub.r=b+Ax.sub.r+x.sub.r.sup.TBx.sub.r
where b, A, B are constants related to the constants of the PDE equation and to the type of model reduction algorithm used, and x.sub.r is of a reduced dimension r and represents the vector of the reduced order states. The original states of the system x can be recovered from x.sub.r using the following simple algebraic equation
x(t)≈Φx.sub.r(t)
where x is usually a vector of high dimension n>>r, containing the n states obtained from the spatial discretization of the PDE equation, and Φ is a matrix formed by concatenating given vectors called modes or basis vectors of the ROM 406. These modes differ depending on which model reduction method is used. Examples of the model reduction methods include a proper orthogonal decomposition (POD), dynamic mode decomposition (DMD) method, and the like.
(47) However, the solution of the ROM equation 406 can lead to unstable solution (divergent over a finite time support) which is not reproducing the physics of the original PDE models having a viscous term that makes the solutions always stable, i.e. bounded over a bounded time support. For instance, the ODE may lose intrinsic characteristics of actual solutions of the physics-based high dimension model, during the model reduction. To that end, the ODE may lose boundedness of the actual solutions of the physics-based high dimension model in space and time.
(48) Accordingly, some embodiments modify the ROM 406 by adding a closure model 404 representing a difference between the ODE and the PDE. For instance, the closure model 404 captures the lost intrinsic characteristics of the actual solutions of the PDE and acts like a stabilizing factor. Some embodiments allow updating only the closure model 406 to reduce the difference between the ODE and PDE.
(49) For instance, in some embodiments, the ROM 406 can be mathematically represented as:
{dot over (x)}.sub.r=b+Ax.sub.r+x.sub.r.sup.TBx.sub.r+F(K,x)
(50) The function F is the closure model 404, which is added to stabilize the solutions of the ROM model 406. The terms b+Ax.sub.r+x.sub.r.sup.TBx.sub.r represent the ODE. The term K represents a vector of coefficients that should be tuned to ensure the stability, as well as, the fact that the ROM 406 needs to reproduce the dynamics or solutions of the original PDE model. In some embodiments, the closure model 404 is the linear function of the state of the system 102. In some other embodiments, the closure model 404 may be the nonlinear function of the state of the system 102. In some embodiments, the reinforcement learning (RL)-based data-driven method may be used to compute the closure model 404. Further, the computation of the closure model 404 using the reinforcement learning (RL) is explained in detail with reference to
(51)
(52)
(53) At steps 508, the apparatus 200 is configured to update the cumulative reward function using the collected data. In some embodiments, the apparatus 200 updates the cumulative reward function (i.e. the value function) to indicate the difference between the shape of the received state trajectory and the shape of state trajectory estimated using the ROM 406 with the current closure model (for instance, the initialized closure model).
(54) Some embodiments are based on realization that the RL uses a neural network trained to minimize the value function. To that end, at step 510, the apparatus 200 is configured to update the current closure model policy using the collected data and/or the updated cumulative reward function, such that the value function is minimized.
(55) In some embodiments, the apparatus 200 is configured to repeat the steps 506, 508, and 510 until a termination condition is met. To that end, at step 512, the apparatus 200 is configured to determine whether the learning is converged. For instance, the apparatus 200 determines whether the learning cumulative reward function is below a threshold limit or whether the two consecutive learning cumulative reward functions are within a small threshold limit. If the learning is converged, the apparatus 200 proceeds with step 516, else the apparatus 200 proceeds with step 514. At step 514, the apparatus 200 is configured to replace the closure model with the updated closure model and iterates the updating procedure until the termination condition is met. In some embodiments, the apparatus 200 iterates the updating procedure until the learning is converged. At step 514, the apparatus 200 is configured to stop the closure model learning and use the last updated closure model policy as the optimal closure model for the ROM 406.
(56) For instance, given a closure model policy u(x), some embodiments define an infinite horizon cumulative reward functional given an initial state x.sub.0∈.sup.n.sup.
(57)
where is a positive definite value function with
(0,0)=0 and {x.sub.k} denotes the sequence of states generated by the closed loop system:
x.sub.t+1=Ax.sub.t+Bu(x.sub.t)+Gϕ(C.sub.qx.sub.t)
(58) In some embodiments, the scalar γ∈(0,1] is a forgetting/discount factor intended to enable the cost to be emphasized more by current state and control actions and lend less credence to the past.
(59) A continuous control policy u(.Math.):.sup.n.sup.
.sup.n.sup.
.sup.n.sup.
(x.sub.0, u) is finite for any initial state x.sub.0 in X. An optimal control policy may be designed that achieves the optimal cumulative reward
(60)
for any initial state x.sub.0 in X. Here, .sub.0 denotes the set of all admissible control policies. In other words, an optimal control policy may be computed as:
(61)
(62) Directly constructing such an optimal controller is very challenging for general nonlinear systems; this is further exacerbated because the system contains uncertain dynamics. Therefore, some embodiments use adaptive/approximate dynamic programming (ADP): a class of iterative, data-driven algorithms that generate a convergent sequence of control policies whose limit is mathematically proven to be the optimal control policy u.sub.∞(x).
(63) From the Bellman optimality principle, the discrete-time Hamilton-Jacobi-Bellman equations are given by
(64)
ADP methods typically involve performing iterations over cumulative reward functions and closure model policies in order to ultimately converge to the optimal value function and optimal closure model policy. The key operations in ADP methods involve setting an admissible closure model policy u.sub.0(x) and then iterating the policy evaluation step until convergence.
(65)
(66) For instance, according to some embodiments, F=K.sub.0x is an admissible closure model policy and the learning cumulative reward function approximator is:.sub.k(x)=ω.sub.k.sup.Tψ(x)
where ψ(x) are a set of differentiable basis functions (equivalently, hidden layer neuron activations) and ω.sub.k is the corresponding column vector of basis coefficients (equivalently, neural network weights). The initial weight vector is, therefore, ω.sub.0.
(67) In one embodiment, when the goal of the ROM 406 is to generate solutions that minimize the quadratic value function:(x.sub.t,u.sub.t)=x.sub.t.sup.TQx.sub.t+u.sub.t.sup.TRu.sub.t,
where R, and Q are two user defined positive weight matrices.
(68) Then the closure model policy improvement step is given by
(69)
(70) Some embodiments are based on recognition that the generated ROM 406 (for instance, an optimal ROM) comprising the ODE 402 and the optimal closure model mimics the pattern of the actual behavior of the system 102, but not the actual values of the behavior. In other words, the ODE 402 with the optimal closure model is a function proportional to the actual physical dynamics of the system 102. For instance, the behavior (i.e. an estimated behavior) of the optimal ROM 406 may be qualitatively similar to the actual behavior of the system 102, but there may exists a quantitative gap between the actual behavior of the system 102 and the estimated behavior. Further, a difference between the actual behavior and the estimated behavior is explained in detail with reference to
(71)
(72) To that end, it is objective of some embodiments to include a gain in the optimal closure model, such that the gap 606 between the actual behavior 602 and the estimated behavior 604 is reduced. For instance, in some embodiments, the closure model may be represented as:
(73)
where θ is a positive gain that needs to be optimally tuned to minimize a learning cost function Q, such that the gap 606 between the actual behavior 602 and the estimated behavior 604 is reduced. Further, the apparatus 200 to determine the gain for reducing the gap 606 is explained in detail with reference to
(74)
(75) In an embodiment, the apparatus 200 uses a physics-based high dimension model behavior 702 (i.e. the actual behavior 602) to tune the gain of the optimal closure model. In some example embodiments, the apparatus 200 computes an error 706 between an estimated behavior 704 corresponding to the optimal ROM 406 and the behavior 702. Further, the apparatus 200 determines the gain that reduces the error 706. Some embodiments are based on realization that the apparatus 200 determines the gain that reduces the error 706 between the state of the system 102 estimated with the optimal ROM 406 (i.e. the estimated behavior 704) and the actual state of the system 102 estimated with the PDE (i.e. the behavior 702). In some embodiments, the apparatus 200 updates the determined gain in the optimal closure model to include the determined gain.
(76) Some embodiments are based on realization that the apparatus 200 uses optimization algorithms to update the gain. In one embodiment, the optimization algorithm may be an extremum seeking (ES) 710, as exemplary illustrated in 7B. In another embodiment, the optimization algorithm may be a Gaussian process-based optimization 712, as exemplary illustrated in 7C.
(77)
(78) In an embodiment, the apparatus 200 uses real-time measurements of parts of the system 102 states 802 (i.e. the actual behavior 602) to tune the gain of the optimal closure model. In some example embodiments, the apparatus 200 computes an error 806 between an estimated behavior 804 corresponding to the optimal ROM 406 and the actual behavior 602 (for instance, the real-time measured states 802 of the system 102). Further, the apparatus 200 determines the gain that reduces the error 806. Some embodiments are based on realization that the apparatus 200 determines the gain that reduces the error 806 between the state of the system 102 estimated with the optimal ROM 406 (i.e. the estimated behavior 704) and the actual state of the system 102 (i.e. the real-time measured state 802). In some embodiments, the apparatus 200 updates the determined gain in the optimal closure model to include the determined gain.
(79) Some embodiments are based on realization that the apparatus 200 uses optimization algorithms to update the gain. In one embodiment, the optimization algorithm may be an extremum seeking (ES) 810, as exemplary illustrated in 8B. In another embodiment, the optimization algorithm may be a Gaussian process-based optimization 812, as exemplary illustrated in 8C.
(80)
(81) At step 902a, the ES algorithm 900 may perturb the control parameter of the optimal closure model. For instance, the ES algorithm 900 may use the perturbation signal to perturb the control parameter. In some embodiments, the perturbation signal may be a previous updated perturbation signal. At step 904a, the ES algorithm 900 may determine the cost function Q for the closure model performance in response to perturbing the control parameter. At step 906a, the ES may determine a gradient of the cost function by modifying the cost function with the perturbation signal. For instance, the gradient of the cost function is determined as a product of the cost function, the perturbation signal and a gain of the ES algorithm 900. At step 908a, the ES algorithm 900 may integrate the perturbation signal with the determined gradient to update the perturbation signal for next iteration. The iteration of the ES 900 can be repeated until the termination condition is met.
(82)
(83) At step 906b, the ES algorithm 900 may multiply the determined cost function with a first periodic signal 906b-0 of time to produce a perturbed cost function 906b-1. At step 908b, the ES algorithm 900 may subtract from the perturbed cost function 906b-1 a second periodic signal 908b-0 having a ninety degree quadrature phase shift with respect to a phase of the first periodic signal 906b-0 to produce a derivative of the cost function 908b-1. At step 910b, the ES algorithm 900 may integrate the derivative of the cost function 908b-1 over time to produce control parameter values 910b-0 as a function of time.
(84)
(85)
(86) Further, each of the n control parameter θ.sub.i 1102 can be updated as explained in detail description of
{dot over (ξ)}.sub.i=a.sub.il sin(ω.sub.it)Q(θ)
θ.sub.i=ξ.sub.i+a.sub.i sin(ω.sub.it)
where the perturbation frequencies ω.sub.i are such that ω.sub.i≈ω.sub.j, ω.sub.i+ω.sub.j≈ω.sub.k, i, j, k, ∈{1, 2, . . . , n}, and ω.sub.i>ω*, with ω* large enough to ensure the convergence. In some embodiments, when the parameters a.sub.i, ω.sub.i, and l are properly selected, the cost function Q(θ) 1106 converges to an neighborhood of an optimal cost function Q(θ*).
(87) In order to implement the multi-parameter ES controller 1100 in the real-time embedded system 102, a discrete version of the multi-parameter ES controller 1100 is advantageous. For instance, the discrete version of the multi-parameter ES controller 1100 may be mathematically represented as:
ξ.sub.i(k+1)=ξ.sub.i(k)+a.sub.ilΔT sin(ω.sub.ik)Q(θ(k))
θ.sub.i(k+1)=ξ.sub.i(k+1)+a.sub.i sin(ω.sub.i(k))
where k is the time step and ΔT is the sampling time.
(88) As should be understood, once the control parameter θ (i.e. the positive gain) is updated, using the ES algorithm or the Gaussian process-based optimization, in the optimal closure model, the optimal closure model in combination with ODE 406 mimics the actual behavior 602 of the system 102. For instance, the estimated behavior 604 may be qualitatively and quantitatively similar to the actual behavior 602 without the gap 606.
(89) To that end, the optimal reduced model 406 comprising the ODE and the optimal closure model with the updated gain may be used to determine the control command. In some embodiments, the optimal reduced model 406 comprising the ODE, the optimal closure model with the updated gain may develop the control policies 106 for the system 102. The control policies 106 may directly map the state of the system 102 to the states of the system 102 to the control commands to control the operation of the system 102. Examples of control command includes, in case the system 102 being the HAVC system, position valves, speed of compressor, parameters of evaporator, and the like. Examples of control command includes, in case the system 102 being a rotor, speed of the rotor, temperature of a motor, and the like. Further, the control command may be transmitted, via the output interface 218, to actuators of the system 102 to control the system 102. Some embodiments are based on recognition that the operation of the system 102 is subjected to constraints. The constraints may include state constraints in continuous state space of the system 102 and control input constraints in continuous control input space of the system 102. Further, the apparatus 200 for controlling the operation subjected to constraints is explained in detail description of
(90)
(91) To that end, some embodiments use the RL-based model (for instance, the optimal reduced order model 406) of the system 102 determined by the data-driven adaptation in various predictive model based algorithms. In some embodiments, an optimizer 1202 is formulated to consider the constraints to control the system 102. Some embodiments are based on realization that the optimizer 1202 may be a model predictive control algorithm (MPC). The MPC is a control method that is used to control the system 102, while enforcing the constraints. To that end, some embodiments take the advantage of the MPC to consider the constraints in control of the system 102. Further, the real-time implementation of the apparatus 200 to control the system 102 is explained in detailed description of
(92)
(93) Some embodiments are based on recognition that the air-conditioning system 102 can be described by the physics-based model called the Boussinesq equation, as exemplary illustrated in
(94)
(95) In some embodiments, the vehicle may include an engine 1410, which can be controlled by the controller 1402 or by other components of the vehicle 1400. In some embodiments, the vehicle may include an electric motor in place of the engine 1410 and can be controlled by the controller 1402 or by other components of the vehicle 1400. The vehicle can also include one or more sensors 1406 to sense the surrounding environment. Examples of the sensors 1406 include distance range finders, such as radars. In some embodiments, the vehicle 1400 includes one or more sensors 1408 to sense its current motion parameters and internal status. Examples of the one or more sensors 1408 include global positioning system (GPS), accelerometers, inertial measurement units, gyroscopes, shaft rotational sensors, torque sensors, deflection sensors, pressure sensor, and flow sensors. The sensors provide information to the controller 1402. The vehicle may be equipped with a transceiver 1412 enabling communication capabilities of the controller 1402 through wired or wireless communication channels with the apparatus 200 of some embodiments. For example, through the transceiver 1412, the controller 1402 receives the control commands from the apparatus 200. Further, the controller 1402 outputs the received control command to one or more actuators of the vehicle 1400, such as the steering wheel and/or the brakes of the vehicle, in order to control the motion of the vehicle.
(96)
(97) In some embodiments, the control inputs include one or combination of a lateral acceleration, a longitudinal acceleration, a steering angle, an engine torque, and a brake torque. The control input constraints include one or combination of steering angle constraints, and acceleration constraints.
(98)
(99) The above description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
(100) Specific details are given in the following description to provide a thorough understanding of the embodiments. However, if understood by one of ordinary skill in the art, the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicated like elements.
(101) Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed, but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
(102) Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks.
(103) Various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
(104) Embodiments of the present disclosure may be embodied as a method, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts concurrently, even though shown as sequential acts in illustrative embodiments.
(105) Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the append claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.