METHOD AND DEVICE FOR DETERMINING MODEL PARAMETERS FOR A CONTROL STRATEGY FOR A TECHNICAL SYSTEM WITH THE AID OF A BAYESIAN OPTIMIZATION METHOD

20220236698 · 2022-07-28

    Inventors

    Cpc classification

    International classification

    Abstract

    Methods for ascertaining a control strategy for a technical system using a Bayesian optimization method. The control strategy is created based on model parameters of a control model and is executable. The method includes providing a quality function whose shape corresponds to a regression function and that evaluates a quality of a controlling of the technical system based on model parameters; carrying out a Bayesian optimization method based on the quality function in order to iteratively ascertain a model parameter set having model parameters within a model parameter domain that indicates the permissible value ranges for the model parameters; and determining the model parameter domain for at least one of the model parameters as a function of an associated maximum a posteriori estimated value of the quality function.

    Claims

    1-11. (canceled)

    12. A method for ascertaining a control strategy for a technical system using a Bayesian optimization method, the control strategy being created based on model parameters of a control model and being executable, the method comprising the following steps for optimization of the controlling: providing a quality function whose shape corresponds to a regression function and that evaluates a quality of a controlling of the technical system based on the model parameters; carrying out a Bayesian optimization method based on the quality function to iteratively ascertain a model parameter set having the model parameters within a model parameter domain that indicates permissible value ranges for the model parameters; and determining the model parameter domain for at least one of the model parameters as a function of an associated maximum a posteriori estimated value of the quality function.

    13. The method as recited in claim 12, wherein a parametric regression model maps an input variable vector and a system state of the technical system onto a subsequent system state, and being correspondingly trained in order to obtain a weighting matrix.

    14. The method as recited in claim 12, wherein the model parameters are ascertained using an AB learning method for a linear quadratic regulator (LQR) controller, an uncertainty measure being determined in each case for the at least one maximum a posteriori estimated value, the value range of the at least one model parameter being defined around the maximum a posteriori estimated value.

    15. The method as recited in claim 14, wherein the value range of the at least one model parameter being determined around the maximum a posteriori estimated value with specification of an uncertainty of an expected value.

    16. The method as recited in claim 12, wherein the model parameters are ascertained using a K-Learning method for a linear quadratic regulator (LQR) controller, the value range of the at least one model parameter being defined around the maximum a posteriori estimated value.

    17. The method as recited in claim 16, wherein the value range of the at least one model parameter is determined around the maximum a posteriori estimated value with a measure that is determined as a product of a specified factor between 0 and 1 of the maximum a posteriori estimated value.

    18. The method as recited in claim 12, wherein the optimization method is started with initial model parameters that result from a minimization of a prior mean value function, a non-parametric approximation model of the technical system being trained in order to obtain the prior mean value function.

    19. A device configured to ascertain a control strategy for a technical system using a Bayesian optimization method, the control strategy being created based on model parameters of a control model and being executable, the device being configured to carry out the following steps for optimization of the controlling: providing a quality function whose shape corresponds to a regression function and that evaluates a quality of a controlling of the technical system based on the model parameters; carrying out a Bayesian optimization method based on the quality function to iteratively ascertain a model parameter set having the model parameters within a model parameter domain that indicates permissible value ranges for the model parameters; and determining the model parameter domain for at least one of the model parameters as a function of an associated maximum a posteriori estimated value of the quality function.

    20. A control system, comprising: a technical system; and a control unit configured to control the technical system, a control model being implemented in the control unit and being configured to provide an input variable vector as a function of state variables of the technical system, a model creation block being provided which is configured to ascertain model parameters for the control model based on a Bayesian optimization method carried out in an optimization block, the control strategy being created based on model parameters of a control model and being executable, the controlling being optimized by: providing a quality function whose shape corresponds to a regression function and that evaluates a quality of a controlling of the technical system based on the model parameters; carrying out a Bayesian optimization method based on the quality function to iteratively ascertain a model parameter set having the model parameters within a model parameter domain that indicates permissible value ranges for the model parameters; and determining the model parameter domain for at least one of the model parameters as a function of an associated maximum a posteriori estimated value of the quality function.

    21. A non-transitory machine-readable storage medium on which is stored a computer program for ascertaining a control strategy for a technical system using a Bayesian optimization method, the control strategy being created based on model parameters of a control model and being executable, the computer program, when executed by a computer, causing the computer to perform the following steps for optimization of the controlling: providing a quality function whose shape corresponds to a regression function and that evaluates a quality of a controlling of the technical system based on the model parameters; carrying out a Bayesian optimization method based on the quality function to iteratively ascertain a model parameter set having the model parameters within a model parameter domain that indicates permissible value ranges for the model parameters; and determining the model parameter domain for at least one of the model parameters as a function of an associated maximum a posteriori estimated value of the quality function.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0039] Below, specific embodiments of the present invention are explained in more detail based on the figures.

    [0040] FIG. 1 shows a schematic representation of a control system having a control unit and a technical system to be controlled, in accordance with an example embodiment of the present invention.

    [0041] FIG. 2 shows a flow diagram illustrating a method for creating a control model using a reinforcement learning method, in accordance with an example embodiment of the present invention.

    DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

    [0042] FIG. 1 shows a schematic representation of a self-adapting control system 1 that is designed for the controlling of a technical system 2. A technical system 2 can be for example an internal combustion engine of a motor vehicle, or a subsystem thereof. A control unit 3 controls technical system 2 with a sequence of input variables u as manipulated variables that cause particular operating points of technical system 2. As a rule, input variables u include a number of a plurality of input variables that are combined in an input variable vector u∈custom-character.sup.d. In addition, for each of the input variables (elements of input variable vector u), there is a permissible value range. In addition, the controlling of technical system 2 results in one or more state variables that are measured at an input variable vector u to be measured, and that are represented in the form of a state variable vector x.

    [0043] Using one or more sensors 21 that are part of technical system 2, courses of one or more measurement variables x.sub.1 . . . x.sub.D can be acquired that respectively represent corresponding state variables x.sub.1 (t) . . . x.sub.D (t), which respectively indicate the system states x of technical system 2. Here D corresponds to the number of state variables. In this way, the system state of technical system 2 is acquired using the one or more sensors 21, and is communicated to control unit 3 as state variables of a state variable vector x.

    [0044] Input variables u correspond to manipulated variables of control unit 3 that are ascertained based on state variables x and the control strategy π.sub.θ(x). The operation of technical system 2 takes place as a function of the input variables u(t) using one or more actuators 22 of technical system 2. For example, in this way a movement of a robot or vehicle can be controlled, or a controlling can take place of a drive unit or of a driver assistance system of a vehicle. An input variable u can for example correspond to an electrical voltage that is applied to an electromechanical positioner as actuator 22. Actuator 22 is controlled corresponding to the one or more input variables u, and carries out a corresponding action. Here, actuator 22 can include a control logic system (not necessarily constructively integrated) that ascertains, from input variables u, a control variable with which the relevant actuator 22 is controlled.

    [0045] In an exemplary embodiment, control unit 3 is used to control an internal combustion engine as technical system. For this purpose, a throttle valve position, a fuel supply, and/or the like can be specified as input variables to the throttle valve positioner or to the controlling for injection valves, and corresponding state variables, such as a rotational speed, a load, an engine temperature, can be received.

    [0046] In an exemplary embodiment, control unit 3 is used for the controlling of an at least partly autonomous robot, in particular an at least partly autonomous motor vehicle, as technical system 2. Sensor 21 can be for example one or more video sensors, preferably situated in the motor vehicle, and/or one or more radar sensors and/or one or more ultrasound sensors and/or one or more lidar sensors and/or one or more position sensors (GPS). Alternatively or in addition, sensor 21 can also include an information system that ascertains an item of information about a state of the technical system (motor vehicle), such as a weather information system that ascertains a current or future state of the weather in an environment of the motor vehicle.

    [0047] In a further exemplary embodiment of the present invention, control unit 3 is used for the controlling of a function in a motor vehicle as technical system. For this purpose, a gas pedal position, a steering intervention in the form of a wrist torque or a steering position, environmental information such as positions of objects in the environment, a braking action, and/or the like can be specified as input variables, and corresponding state variables can be received that indicate the driving behavior of the motor vehicle, such as vehicle speed, curve position, distance from objects in the environment, and the like.

    [0048] Control unit 3 can, with the plurality of measurement variables x.sub.1 . . . x.sub.D, for example detect states or state curves of the at least partly autonomous robot, such as an engine rotational speed, a vehicle speed, a fuel consumption, an engine temperature, a longitudinal speed and/or transverse speed, a steering angle, a yaw rate, and the like. Actuator 32, preferably situated in the motor vehicle, can be for example a brake, a drive, or a steering mechanism of the motor vehicle.

    [0049] Alternatively, the at least partly autonomous robot can also be some other mobile robot (not shown), for example one that moves by flying, swimming, immersion, or stepping. The mobile robot can for example also be an at least partly autonomous lawnmower, or an at least partly autonomous cleaning robot.

    [0050] In still other alternatives, the at least partly autonomous robot can also be a household device (not shown), in particular a washing machine, a stove, an oven, a microwave, or a dishwashing machine. With sensor 21, for example an optical sensor, a state of an object treated using the household device can be acquired such as, in the case of the washing machine, a state of laundry in the washing machine. Using control unit 3, a type or a state of this object can then be ascertained and can be characterized by measurement variables x.sub.1 . . . x.sub.D. The input variables can then be ascertained in such a way that the household device is controlled as a function of the ascertained type or ascertained state of the object. For example, in the case of the washing machine this machine can be controlled as a function of the material of which the laundry situated therein is made. Input variables u(t) can then be selected as a function of the ascertained material of the laundry.

    [0051] In a further specific embodiment of the present invention, control unit 3 can be used for the controlling of a manufacturing machine (technical system 3) of a manufacturing system, by controlling an actuator 22 that controls this manufacturing machine using input variables. Manufacturing machine 11 can for example be a machine for stamping, sawing, drilling, milling, lathing, and/or cutting.

    [0052] Sensor 21 can then for example be an optical sensor that acquires for example properties of manufactured products. It is possible for actuator 32 that controls the manufacturing machine to be controlled as a function of the ascertained properties of the manufactured product, so that the manufacturing machine correspondingly carries out a subsequent processing step of this manufactured product. It is also possible for sensor 31 to ascertain the properties of the manufactured product processed by the manufacturing machine, and, as a function thereof, to adapt a controlling of the manufacturing machine for a subsequent manufactured product.

    [0053] The controlling of control unit 3 follows a control strategy. Through a dynamic process, the control strategy is to be adapted so that the system behavior becomes optimal with respect to a quality function. For this purpose, an optimization method is carried out that optimizes model parameters of the control model forming the basis of the control strategy in such a way that the performance of controlled technical system 2 is optimized. For this purpose, a control model (dynamic model) is created in a model creation block 4, which is the basis for the control strategy of control unit 3. Model creation block 4 ascertains the model parameters for the control model on the basis of a Bayesian optimization method carried out in an optimization block 5. This takes place based on a specified quality function that is determined or specified in a quality function block 6.

    [0054] In further preferred specific embodiments, control unit 3, model creation block 4, optimization block 5, and cost function block 6 are implemented in a computing unit. The computing unit includes control device 2, one or more processors, and at least one machine-readable storage medium on which instructions are stored that, when they are executed on the processors, cause the computing unit to carry out the method according to the present invention.

    [0055] Technical system 2 corresponds to a dynamic system that, using a control unit 3, is controlled in an optimal manner with a suitable control strategy that is to be correspondingly created using a Bayesian optimization method. The Bayesian optimization method is used to ascertain the control model by, during the optimization method, iteratively applying various test model parameter sets for the controlling of technical system 2, and adapting the model parameters based on the resulting state variables. Here, a quality function is modeled using a Gauss process regression by which the performance of the controlling of the technical system is defined as a function of the model parameters. The performance of the controlling results from a specified quality criterion that assigns a quality of the controlling to the resulting state variables. The state variables have tolerances, so that the quality function is preferably created using a Gauss process regression.

    [0056] Fundamentally, the problem is that of finding a control strategy that maps a system state x onto an input variable vector u=π.sub.θ(x) with π.sub.θ:custom-character.sup.n.sup.x.fwdarw.custom-character.sup.n.sup.u, where θ∈Θ⊂custom-character.sup.n.sup.θ represent model parameters of the control strategy in the model parameter domain Θ. A quality function J that is a function of the model parameters θ is specified over a predetermined time horizon t=0 . . . T based on the state vectors x and input variable vectors u; here the model parameters θ are to be optimized by the optimization method

    [00001] min θ J ( θ ) = min θ .Math. t = 0 T �� [ c ( x t , π θ ( x t ) ) ] s . t . x t + 1 = f ( x t , π θ ( x t ) ) + v

    where custom-character, corresponds to an expected value, c(x.sub.t,u.sub.t) represent the costs of the state indicated by state vector x.sub.t for input variable vector u.sub.t, and ƒ:custom-character.sup.n.sup.x×custom-character.sup.n.sup.u.fwdarw.custom-character.sup.n.sup.x represent the state transition model that describes the dynamics of technical system 2 and that is additionally subject to the noise variable v˜N(0,Σ.sub.v).

    [0057] The Bayesian optimization method is used to find the optimized model parameters θ* of a control strategy through a minimization of the quality function. The goal is for the control strategy to enable a controlling that is as optimal as possible of technical system 2 with control unit 3, where “optimal” refers to a minimization of the costs determined by the quality function, in relation to a specified performance of the overall system of control unit 3 and technical system 2.

    [0058] The quality function thus correspondingly provides the deviation of the behavior of real technical system 2 during time window t=0 . . . T in relation to a specified performance with costs J. The evaluation of the quality function thus requires the operation of technical system 2 in the real environment in a measurement process. Due to the necessity of really operating the control system that includes technical system 2 and control unit 3, the evaluation of the costs J becomes very costly, so that the number of measurement processes at the real technical system 2 for the evaluation of a particular control strategy is to be minimized to the greatest possible extent.

    [0059] The creation of a controlling can be carried out using an LQR controller that requires a linear dynamic model of the system to be controlled. This LQR controller can be described by a feedback matrix K, while elements of the feedback matrix as model parameters can be completely or partly adapted for the optimization.

    [0060] In the following, it is assumed that the control strategy corresponds to a linear state control strategy of π.sub.θ(x)=−K(θ)x.

    [0061] Linear control strategies have the advantage that they have a low dimensionality compared to other control models. In addition, the linear control strategy can easily be implemented in controllers, thus increasing the efficiency of the Bayesian optimization.

    [0062] In connection with the Bayesian optimization, a linear quadratic regulator, a so-called LQR controller, can be used, as is known in the field of creation of control strategies. In the LQR controller, the system behavior and the interaction with the environment during measuring processes are ascertained through a controlled operation with a set of varying input variable vectors and the acquisition of resulting state variable vectors. Here, the system dynamics is linearized according to


    ƒ(x.sub.t,u.sub.t)≈Ax.sub.t+Bu.sub.t

    and the costs are correspondingly squared


    c(x.sub.t,u.sub.t)≈x.sub.t.sup.TQx.sub.t+u.sub.t.sup.TRu.sub.t

    [0063] Through these approximations, in model creation block 4 an LQR feedback matrix can be created that represents the dynamic model and that is generally designated K=dlqr(A,B,Q,R). The control strategy optimization is carried out by directly adapting the feedback matrix (K-Learning), some of the entries, or each entry, of the feedback matrix corresponding to a model parameter for the optimization. In addition, it is also possible for only the components of feedback matrix K that correspond to system matrices A and B to be assumed as model parameters to be optimized, each entry of matrices A and B corresponding to a model parameter.

    [0064] Alternatively, in feedback matrix K without the matrices Q and R (weighting matrices) can be assumed as model parameters to be optimized. Here, it is sufficient to adapt only the diagonal entries of the weighting matrices, which have the following form:

    [00002] K QR ( θ ) = dlqr ( A , B , Q ( θ ) , R ( θ ) ) with Q ( θ ) = diag ( 1 0 θ 1 , .Math. , 10 θ n x ) and R ( θ ) = diag ( 1 0 θ n x + 1 , .Math. , 10 θ n x + n u )

    [0065] Corresponding to the belonging of the model parameters to the above matrices, these methods are called K-Learning, AB-Learning, and QR-Learning. In Bayesian optimization, the parameter space has to be adequately covered with respect to the length scales of the quality function in order to find a good estimation of the optimized model parameters.

    [0066] However, without previous knowledge it is difficult to select the value ranges for the individual model parameters, i.e. the model parameter domain, for the optimization. However, this is essential for an efficient optimization method without an excessive number of measurement processes. The above method therefore provides the selection of a suitable search region for the model parameters, so that the number of measurement processes can be reduced. This is required in particular in cases of high dimensionality, because there a manual setting of the value ranges of each of the model parameters is not easily possible. Therefore, the value ranges of the model parameters are first ascertained by learning a distribution via dynamic models, and subsequently using this distribution to select the value ranges for each of the model parameters. The distribution is obtained using Bayesian linear regression, through recorded data values of the state variable vectors and input variable vectors, in order to obtain an approximate linear model of the system dynamics. This results in a Gaussian distribution over the models


    p(vec(A,B)|Data)=N(vec(A,B)|μ.sup.AB,Σ.sup.AB)

    where μ.sup.AB indicates the maximum a posteriori (MAP) estimated value, the notation vec(.,.) indicating that the matrices A and B are transformed into a vector.

    [0067] After the value ranges of the model parameters are selected, it is possible that in certain dimensions the value ranges of the model parameters have been selected too conservatively. This can occur for example when the scaling parameter β is too small, or due to a model deviation. As a result, the optimal model parameter vector may not lie within the selected model parameter domain. Therefore, the model parameter domain can be dynamically adapted during the optimization.

    [0068] During the running of the Bayesian optimization, an estimated value of the optimum of the model parameter values is present, i.e., the minimum of the approximated quality function in the current model parameter domain. If the Bayesian optimization yields the result that the location of the estimated optimum is at a boundary of the model parameter domain, then it is probable that better model parameters lie outside the current model parameter domain. Therefore, it is proposed to expand the value range of the model parameter whose value lies at the boundary of the model parameter domain. This dynamic adaptation of the model parameter domain can be carried out in various ways.

    [0069] The adaptation of the value ranges for model parameters makes it possible, starting from a limited value range, to dynamically adapt this range during the optimization only for those dimensions of the model parameter vector in which the optimization relates to a border area of the model parameter domain. In this way, the optimization can be carried out more efficiently overall, so that the convergence is significantly improved. In addition, potential model errors can be better compensated, so that the optimized system model shows better performance. Through the improvement of the efficiency, it is possible to scale the Bayesian optimization to high-dimensional control strategies.

    [0070] In order to create the quality function for performance of the control system with respect to model parameter sets, first of all data are provided.


    D={θ.sub.i;J(θ.sub.i)} with i=1 . . . n

    [0071] This is used to train an initial Gauss process model as the quality function that maps the test model parameters onto costs.


    μ(θ*)=kK.sup.−1.sup.2(θ*)=k(θ*,θ*)−kK.sup.−1k.sup.T

    where K corresponds to the covariance matrix, with


    K.sub.ij=k(θ.sub.i,θ.sub.j),k=[k(θ.sub.1,θ*), . . . ,k(θ.sub.n,θ*)] and J=[J(θ.sub.1), . . . ,J(θ.sub.n)]

    [0072] In this way, the Gauss process model supplies both the expected value, i.e., the costs J, and also the uncertainty of this expected value.

    [0073] From this, the initial model parameter domain can now be ascertained, because without previous knowledge it is difficult to select the value ranges for the individual model parameters, i.e. the model parameter domain, for the optimization. The selection of a suitable model parameter domain is essential for an efficient optimization method not having an excessive number of measurement processes. Therefore, in step S3 a suitable search area for the model parameters is selected, so that the number of measurement processes can be reduced. This is required in particular given high dimensionality, because there a manual setting of the value ranges of each of the model parameters is not easily possible. Therefore, the value ranges of the model parameters are ascertained based on the previously trained Gauss process model, and subsequently this distribution is used to select the value ranges for each of the model parameters.

    [0074] The distribution is obtained through Bayesian linear regression, using recorded data values of the state variable vectors and input variable vectors in order to obtain an approximated linear model of the system dynamics. This results in a Gauss distribution over the models


    p(vec(A,B)|Data)=N(vec(A,B)|μ.sup.AB,Σ.sup.AB)

    where μ.sup.AB indicates the maximum a posteriori (MAP) estimated value, and the notation vec(.,.) indicates that the matrices A and B are transformed into a vector.

    [0075] System 1 of FIG. 1 further includes an optimization unit 22. Optimization unit 22 can be provided in control device 2 or, alternatively, separately therefrom. Optimization unit 22 has the aim of ascertaining for controller 21 a control model by which the dynamic technical system 3 can be controlled. For this purpose, optimization unit 22 carries out an iterative optimization method by which the control model is created by minimizing a quality function. The minimization of the quality function can be expressed as:

    [00003] J = lim T .fwdarw. min u o : T �� [ 1 T .Math. t = 0 T x t T Q x t + u t T R u t ] s . t . x t + 1 = f ( x t , u t ) + v , v N ( 0 , .Math. v )

    where the initial condition is specified by x.sub.0. x.sub.t here corresponds to a state vector for a system state at time t, and u.sub.t corresponds to an input variable vector at time t. Cost matrices Q and R are assumed as positive semi-definite or positive definite. Based on the linear approximation of the dynamic behavior ƒ(x.sub.t,u.sub.t)=Ax.sub.t+Bu.sub.t and under the assumption of a linear state feedback controller u.sub.t=π(x.sub.t)=−Kx.sub.t e with a control strategy π, there results an approximately static solution of the above minimization problem, with


    K=(R+B.sup.TPB).sup.−1B.sup.TPA


    and


    0=A.sup.T(P.sup.−1+BR.sup.−1B).sup.−1A−P+Q

    where the latter equation corresponds to the time-discrete algebraic Riccati equation (DARE), which can be solved efficiently for P using the Kleinman method. The linear state feedback controller is described in the following using the abbreviated notation dlqr (A, B, Q, R).

    [0076] In control theory, the solution of the latter equation above is known as an LQR (Linear Quadratic Regulator) controller. Frequently, the modeling of an LQR controller leads to unsatisfactory results due to the linear approximation of the system dynamics, which is often an adequately accurate approximation only in the immediate vicinity of the operating point.

    [0077] In the following, a control strategy search is to be carried out based on a Bayesian optimization method. This method provides a Gauss process regression. A Gauss process regression is a non-parametric method for modeling an a priori unknown function J(θ):Θcustom-charactercustom-character. Using the Gauss process regression, given noisy observations of the system behavior both the curve of the functional values and the uncertainty of the prediction of each of the functional values can be determined. The Gauss process can be understood as a distribution over functions, and is defined by a prior mean value function m(θ) and a covariance function k(θ,θ′). The mean value function indicates the a priori knowledge about the quality function J(θ) to be mapped, and is often assumed as zero. The covariance function is also called the kernel, and defines the correlation between each two functional values J(θ) and J(θ′), with θ,θ′∈Θ.

    [0078] Under the assumption of n noisy observed values: custom-character.sub.n={(θ.sub.i,Ĵ(θ.sub.i))} with Ĵ(θ)=J(θ)+ω,ω˜N(0,σ.sub.ω.sup.2), the prior distribution over the predictions for the measured data can be created in order to obtain the posterior prediction of the quality function at each point θ*∈Θ. The posterior mean value and covariance are given by


    μ.sub.n(θ*)=m(θ*)+k.sub.n(θ*)K.sub.n.sup.−1ŷ.sub.n, σ.sub.n.sup.2(θ*)=k(θ*,θ*)−k.sub.n(θ*)K.sub.n.sup.−1k.sub.n.sup.T(θ*)


    where


    ŷ=[Ĵ(θ.sub.1)−m(θ.sub.1), . . . ,Ĵ(θ.sub.n)−m(θ.sub.n)].sup.T,k.sub.n(θ*)=[k(θ*,θ.sub.1), . . . ,k(θ*,θ.sub.n)]

    and the symmetrical Gram matrix K.sub.n∈custom-character.sup.n×n includes the entries


    [K.sub.n].sub.(i,j)=k(θ.sub.i,θ.sub.j)+δ.sub.i,jσ.sub.ω.sup.2

    [0079] The Gauss process regression is used to model the behavior of technical system 2 and to evaluate its optimality. The behavior of the system made up of the controller and technical system 2 is represented by a quality function that represents a functional relation between the model parameters and the resulting costs of the technical system controlled based on a control strategy defined by the model parameters. The Bayesian optimization method is then applied in order to optimize the quality function. This is not possible analytically, and therefore has to be done iteratively. The outlay for the optimization should therefore be limited to the smallest possible number of iterations.

    [0080] An iteration corresponds to a measurement process of an application of a control strategy, defined by model parameters θ* that are to be considered, to real technical system 2. From this there results a new data pair θ.sub.n+1,Ĵ(θ.sub.n+1) that is added to the training data for the Gauss process.


    custom-character.sub.n+1=custom-character.sub.n∪(θ.sub.n+1(θ.sub.n+1)

    [0081] After each measurement process, a new evaluation point is selected by maximizing an acquisition function σ(θ.sub.n,custom-character.sub.n), which can be carried out efficiently using numerical optimization technology, such as L-BFGS. Alternatives for the acquisition functions may be used, for example probability of improvement (PI), expected improvement (EI), and upper confidence bound (UCB). All of these functions offer a trade-off between exploration, i.e. preferring regions of the input variable vectors in which the quality function has not yet been evaluated, and exploitation, i.e. preferring a region in which an estimated optimum (minimum) of the quality function lies.

    [0082] For the Bayesian optimization, only the overall behavior of a technical system operated through a measurement process is evaluated, while the trajectory data, or the curve data, are usually discarded. For the system identification using a Bayesian linear regression, the state variables can be sampled with a high frequency in order to obtain an uncertainty measure for the model to be estimated.

    [0083] In classical parametric regression, a model y(x,w) is assumed having an input x and weights or parameters w that are used to estimate a noisy target variable t.

    [0084] Under the assumption that the noise follows a Gauss distribution with precision (inverse variance) y, the posterior distribution corresponds to the weights for n measurements that are indicated in a matrix X.sub.n=[x.sub.1.sup.T, . . . , x.sub.n.sup.T].sup.T and respective target values that are indicated in a vector t.sub.n=[t.sub.1, . . . , t.sub.n].sup.T:


    p(w|t)=N(w|m.sub.n,S.sub.n), θ.sup.MAP=m.sub.n=S.sub.n(S.sub.0.sup.−1m.sub.0+γX.sup.Tt), S.sub.n.sup.−1=S.sub.0.sup.−1+γX.sup.TX

    where n corresponds to the number of data points, γ corresponds to a specified constant that describes the noise in the data, m.sub.0 corresponds to an a priori mean value of the model parameters to be estimated (the mean values of the Gaussian prior for the model parameters), S.sub.0 corresponds to an a priori covariance of the model parameters to be estimated (the covariances of the Gaussian prior for the model parameters), X corresponds to the locations of the data points (combined in a matrix), t corresponds to the functional values of the data points (combined in a vector), m.sub.n corresponds to the a posteriori expected values (mean values) of the model parameters to be estimated, and S.sub.n corresponds to the a posteriori covariance of the model parameters to be estimated.

    [0085] The maximum posterior estimated value (MAP estimated value) of the weights w corresponds to the mean value of the posterior distribution, i.e. W.sup.MAP=m.sub.n, and its covariance corresponds to S.sub.n.sup.−1.

    [0086] The ascertaining of the control model takes place through reinforcement learning in order to ascertain a parameterization for a linear state feedback controller using a Bayesian optimization method.

    [0087] In the following, measures for accelerating the convergence of the optimization method are proposed.

    [0088] As an initial assumption, it is assumed that nonlinear technical system 2 is approximated by a linear model, and the Bayesian optimization method is used to optimize the entries in system matrices A, B. The resulting control model K.sup.AB can then be written as


    π.sup.AB(x.sub.t;θ)=−K.sup.AB(θ)x.sub.t, K.sup.AB(θ)=P=dlqr(A(θ),B(θ),Q,R)

    [0089] This method is called the AB learning method.

    [0090] An alternative learning method is the so-called K-learning method, in which optimization takes place directly into the feedback reinforcement matrix. Here, the control module corresponds to π.sup.K(x.sub.t;θ)=−K.sup.K(θ)x.sub.t, which, in contrast to the approach described above, does not use the above and is therefore a model-free approach.

    [0091] On the basis of the flow diagram of FIG. 2, the method for creating a control model using a Bayesian optimization method is now described.

    [0092] In step S1, first the MAP weighting matrices A.sup.MAP, B.sup.MAP are ascertained. This can be done without previous knowledge about the control model. In the identification process for technical systems, a goal is to ascertain the successor state x.sub.t+1 of technical system 2 based on a given current state x.sub.t and an input variable vector u.sub.t. Thus, the following holds:


    y(x,w)custom-characterAx.sub.t+Bu.sub.t, xcustom-character(x.sub.t,u.sub.t), wcustom-character(A,B), tcustom-characterx.sub.t+1

    [0093] The maximum posterior estimated value (MAP estimated value) corresponds to the weights θ.sup.MAP.

    [0094] The advantage of the use of a Bayesian linear regression is that it obtains not only the MAP estimated value of the mean value θ.sup.MAP, but also an estimation of the uncertainty σ.sup.MAP. These correspond to the entries of the diagonals of matrix S.sub.n.

    [0095] In step S2, the value ranges for the model parameters to be determined for the control model are defined. The Bayesian optimization has to cover the value ranges of the model parameters as adequately as possible, including with respect to the length scales of the quality function. It is often difficult to determine a priori in which value ranges the model parameters, i.e. the entries in matrix K, have to be optimized in order to achieve a good controlling behavior.

    [0096] For this purpose, with the aid of the MAP estimated values w.sup.MAP, σ.sup.MAP from the system identification of step S1, appropriate value ranges in which the model parameters are subsequently optimized are selected for each of the model parameters.

    [0097] For the AB learning method, the choice of value ranges Θ (model parameter domain) starting from the MAP estimated values θ.sup.MAP, σ.sup.MAP corresponds to the matrix K and to the variances of the parameter estimated values. Thus, the iσ variance around the MAP estimated values can be assumed as value ranges for the parameters, so that the following holds:


    θ.sup.AB[(θ.sub.1.sup.MAP−iσ.sub.1,θ.sub.1.sup.MAP+iσ.sub.1]× . . . ×[θ.sub.n.sub.θ.sup.MAP−iσ.sub.n.sub.θ,θ.sub.n.sub.θ.sup.MAP+iσ.sub.n.sub.θ]

    where i can preferably be assumed to be between 1 and 4, in particular 2.

    [0098] In this way, model parameters that have a higher degree of uncertainty are assigned a larger value range for the optimization during the Bayesian optimization method, and, conversely, well-identified model parameters, i.e. model parameters having low uncertainty, are assigned a smaller value range.

    [0099] In the K-learning method, the value ranges are constructed using the LQR controller for the estimated system:


    K.sup.MAP=dlqr(A.sup.MAP,B.sup.MAPQ,R)

    [0100] The value ranges of the model parameters of the control model can here be provided between 0 and twice the MAP estimated value of the respective model parameter.


    Θ.sup.K=[0,2θ.sup.MAP]× . . . ×[0,2θ.sub.n.sub.θ.sup.MAP]

    [0101] In general, the value ranges of the model parameters of the control model can be specified:


    Θ.sup.K=[θ.sub.1.sup.MAP−k,θ.sub.1.sup.MAP+k]× . . . ×[θ.sub.n.sub.θ.sup.MAP−k,θ.sub.n.sub.θ.sup.MAP+k]

    with 0<k=<θ.sub.n.sub.θ.sup.MAP

    [0102] In step S3, a prior mean value function is ascertained. This uses the approximation model of technical system 2 to ascertain a simple simulator of technical system 2. This simulator is used to estimate the shape of the quality function. The approximation model of technical system 2 is determined for example using a Gauss process based on a small number of measurement points. The number of measurement points used for this can be between 10 and 100. Subsequently, an approximated prior mean value function of the costs can be constructed e.g. through the Gauss process regression.

    [0103] Subsequently, initial model parameters θ.sub.0 are determined in step S4 by minimizing the prior mean value function.

    [0104] Using the initial model parameters θ.sub.0, in step S5 the control model thereby defined can be evaluated in a measurement process, and corresponding minimum costs Ĵ(θ.sub.0) can be determined according to the quality function.

    [0105] Subsequently, in step S6 the next model parameters θ.sub.n+1 are determined for the next iteration of the optimization method from the maximization of the acquisition function α(θ.sub.n,custom-character.sub.n), taking into account the last-determined data pair θ.sub.n,Ĵ(θ.sub.n).

    [0106] Using the current model parameters θ.sub.n+1, in step S7 the control model thereby defined can be evaluated in a next measurement process, and corresponding minimum costs Ĵ(θ.sub.n+1) can be determined.

    [0107] Subsequently, in step S8 the training data of the last-ascertained data pair are added.


    custom-character.sub.n+1=custom-character.sub.n∪(θ.sub.n+1(θ.sub.n+1)

    [0108] Subsequently, in step S9 a stop criterion is checked that for example indicates whether an adequate performance of the computing model has been achieved, or whether an adequate convergence is present. If the stop criterion is met (alternative: yes), then the method is ended with step S10; otherwise (alternative: no) a jump takes place back to step S6.

    [0109] In step S10, the last-ascertained model parameters for the control strategy are applied.