Method and device for setting at least one parameter of an actuator control system and actuator control system

Abstract

The invention relates to a method for automatically setting at least one parameter of an actuator control system which is designed to control a control variable of an actuator to a predefinable setpoint. The actuator control system is designed to generate a control variable depending on the at least one parameter, the setpoint and the control variable and to actuate the actuator depending on this control variable. A new value of the at least one parameter is determined depending on a stationary probability distribution of the control variable, and the parameter is then set to this new value.

Claims

1. A method for automatically setting at least one parameter of an actuator control system which is designed to control a control variable of an actuator to a predefined target variable, wherein the actuator control system is designed to generate a manipulated variable (u) depending on the at least one parameter, the target variable and the control variable and to actuate the actuator depending on said manipulated variable, wherein a new value of the at least one parameter is determined depending on a stationary probability distribution of the control variable, and the at least one parameter is then set to this new value; and wherein the stationary probability distribution is a probability distribution towards which a probability distribution of the control variable converges during use of a control strategy of the actuator control system that depends on the at least one parameter.

2. The method of claim 1, wherein the stationary probability distribution is determined depending on a model, in particular on a Gaussian process, advantageously on a sparse Gaussian process, of the actuator.

3. The method according to claim 2, wherein the model is adapted depending on the manipulated variable, which is supplied to the actuator when the actuator is controlled using the actuator control system, and the resulting control variable, wherein, after the model is adapted, a new value of the at least one parameter is redetermined depending on the stationary probability distribution of the control variable of the actuator, wherein the redetermination of the new value of the at least one parameter is determined depending on the now adapted model.

4. The method according to claim 1, wherein the stationary probability distribution of the control variable is determined by an approximation of an integration using possible values of the control variable, wherein said approximation is carried out using numerical quadrature.

5. The method according to claim 4, wherein a density of support points is determined depending on a determined temporal evolution of the control variable, which evolution is in particular determined by means of the model and/or the actuator control system, proceeding from an initial value of the control variable that was randomly determined from an initial probability distribution.

6. The method according to claim 5, wherein the density of the support points is also determined depending on a determined temporal evolution of the control variable, which is in particular determined by means of the model and/or the actuator control system, proceeding from the predefined target variable as the initial value of the control variable.

7. The method according to claim 5, wherein a density of the support points is selected depending on a variable (Var) which characterizes a smoothness of the model at at least one value of the control variable in the determined temporal evolution(s) of the control variable.

8. The method according to claim 7, wherein the density of support points in a region is selected depending on a smallest value, wherein the smallest value is the smallest value of the variable which characterize a smoothness of the model at the values of the control variable which are in this range.

9. The method according to claim 8, wherein the density of support points is increased if a quotient of the average density of support points and the smallest value falls below a predefinable threshold value.

10. The method according to claim 5, wherein the density of the support points in a region can also be selected depending on an average density of the support points in said region.

11. The method according to claim 4, wherein a result of the numeric quadrature is determined depending on a dominant eigenvector of a matrix which is given by a product of a diagonal matrix of support weights with a transition matrix, wherein the components of the transition matrix each characterize a probability of a transition of the control variable from a first support point to a second support point.

12. The method according to claim 1, wherein a long-term cost function is selected depending on a local cost function, wherein the local cost function is selected depending on a Gaussian function and/or a polynomial function which is dependent on a difference between the manipulated variable and the predefined target value.

13. The method according to claim 1, wherein the manipulated variable is limited to values within a predefinable manipulated variable range by a limiting function.

14. A learning system for automatically setting the at least one parameter of the actuator control system which is designed to control the control variable of the actuator to the predefined target variable, wherein the learning system is designed to carry out the method according to claim 1.

15. The learning system according to claim 14, wherein the learning system is designed to determine the stationary probability distribution of the control variable by an approximation of an integration using possible values of the control variable, wherein said approximation is carried out using numerical quadrature, and determine, by means of a GPU, a result of the numeric quadrature depending on a dominant eigenvector of a matrix which is given by a product of a diagonal matrix of support weights with a transition matrix, wherein the components of the transition matrix each characterize a probability of a transition of the control variable from a first support point to a second support point.

16. A non-transitory machine-readable storage medium containing computer program instructions for carrying out the method of claim 1.

Description

(1) Embodiments of the invention are described below in greater detail and with reference to the accompanying drawings, in which:

(2) FIG. 1 schematically shows an interaction between the learning system and the actuator;

(3) FIG. 2 schematically shows a structure of parts of the learning system;

(4) FIG. 3 schematically shows an interaction between the actuator control system and the actuator;

(5) FIG. 4 schematically shows in a flowchart an embodiment of the method for training the actuator control system;

(6) FIG. 5 schematically shows in a flowchart an embodiment of the method for determining the new optimal parameter;

(7) FIG. 6 schematically shows in a flowchart an embodiment of the method for determining the support points.

DESCRIPTION OF THE EMBODIMENTS

(8) FIG. 1 shows the actuator 10 in the surroundings 20 thereof, in interaction with the learning system 40. The actuator 10 and surroundings 20 are subsequently also jointly referred to as the actuator system. A state of the actuator system is detected using a sensor 30, which can also be provided using a plurality of sensors. An output signal S of the sensor 30 is transmitted to the learning system 40. From said signal, the learning system 40 determines a control signal A which the actuator 10 receives.

(9) The actuator 10 can be, for example, a (partially) autonomous robot, for example a (partially) autonomous motor vehicle or a (partially) autonomous lawnmower. It can also be an actuation means of an actuating member of a motor vehicle, for example a throttle valve or a bypass actuator for idling control. It can also be a heating system or a part of a heating system, such as a valve actuator. The actuator 10 can in particular be larger systems, such as a combustion engine or an (optionally hybridized) drive train of a motor vehicle, for example, or also a braking system.

(10) The sensor 30 can be, for example, one or more video sensors and/or one or more radar sensors and/or one or more ultrasound sensors and/or one or more position sensors (for example GPS). Other sensors are also conceivable, for example a temperature sensor.

(11) In another embodiment the actuator 10 can be a manufacturing robot, and the sensor can be an optical sensor 30, for example, which detects the properties of manufactured articles of the manufacturing robot.

(12) The learning system 40 receives the output signal S from the sensor in an optional receiving unit 50 which converts the output signal S into a control variable x (alternatively, the output signal S can also be directly adopted as the control variable x). The control variable x can be a portion or further processing of the output signal S, for example. The control variable x is supplied to a controller 60, in which a control strategy π is implemented.

(13) Parameters θ which are supplied to the controller 60 are stored in a parameter memory 70. The parameters θ parameterize the control strategy π. The parameters θ can be a singular parameter or a plurality of parameters.

(14) A block 90 supplies the predefinable target variable xd to the controller 60. The block 90 can generate the predefinable target variable xd, for example depending on a sensor signal that is predefined to the block 90. It is also possible that the block 90 reads out the target variable xd from a dedicated storage region in which the variable is stored.

(15) Depending on the control strategy π(θ) (and therefore depending on the parameters θ) of the target variable xd and the control variable x, the controller 60 generates a manipulated variable u. This manipulated variable can for example be determined depending on a difference x-xd between the control variable x and the target variable xd.

(16) The controller 60 transmits the manipulated variable u to an output unit 80 which determines the control signal A from said variable. It is possible, for example, that the output unit first checks whether the manipulated variable u is in a predefinable value range. If this is the case, the control signal A is determined depending on the manipulated variable u, for example, by an associated control signal A being read out from a characteristic diagram depending on the manipulated variable u. This is the norm. If it is determined, however, that the manipulated variable u is not in the predefinable value range, the control signal A can therefore be designed such that it switches the actuator A into a protected mode.

(17) The receiving unit 50 transmits the control variable x to a block 100. The controller 60 also transmits the corresponding manipulated variable u to the block 100. The block 100 stores the time series of the control variable x, which is received at a sequence of time points, and of the relevant corresponding manipulated variable u. The block 100 can then adapt model parameters Λ, σ.sub.n, σ.sub.f of the model g depending on said time series. The model parameters Λ, σ.sub.n, σ.sub.f are supplied to a block 110 which, for example, stores said parameters in a dedicated storage region. This is described below in FIG. 4, step 1030.

(18) The learning system 40 comprises, in one embodiment, a computer 41 which has a machine-readable storage medium 42 on which a computer program is stored that, when carried out by the computer 41, prompts the computer to carry out the described functions of the learning system 40. In the embodiment, the computer 41 comprises a GPU 43.

(19) The model g can be used to optimize the parameters θ of the control strategy π. This is illustrated schematically in FIG. 2.

(20) The block 120 transmits the model parameters Λ, σ.sub.n, σ.sub.f to a block 140 and a block 150. A block 130 determines a noise variance τ.sub.∈, and a maximum partitioning depth Lmax (for example by these values being predefined and being read out from dedicated storage regions in the memory), and transmits them to the block 140. The parameter memory 70 transmits parameters θ to the block 140, and the block 90 transmits the target value xd to the block 140.

(21) The block 140 determines support points ξ.sub.i and associated support weights w.sub.i from said values. One embodiment of the algorithm of said determination is illustrated in FIG. 6. The determined support points ξ.sub.i and associated support weights w.sub.i are conveyed to the Block 150.

(22) The block 150 determines new parameters θ* from said points and weights. This is described in FIG. 4, step 1050. The new parameters θ* are conveyed to the parameter memory 70, where the values of the parameters θ are replaced with the respective corresponding values of the new parameters θ*.

(23) The blocks shown in FIG. 2 can be part of the learning system 40, and there, as described in conjunction with FIG. 1, the blocks can be implemented as part of a computer program and stored on the machine-readable storage medium 42.

(24) FIG. 3 illustrates the interaction of the actuator control system 45 with the actuator 10. The structure of the actuator control system 45 and the interaction thereof with the actuator 10 and the sensor 30 is largely the same as the structure of the learning system 40, for which reason only the differences are described here. In contrast to the learning system 40, the actuator control system 45 does not have a block 100 and also does not have a block 110. The transmission of variables to the block 100 is therefore not applicable. Parameters θ which have been determined using the method according to the invention, as illustrated in FIG. 4 for example, are stored in the parameter memory 70 of the actuator control system 45.

(25) FIG. 4 illustrates an embodiment of the method according to the invention. Firstly (1000), the parameters θ are set at initial values. In this case the parameters θ can be randomly initialized, but they can also be fixedly predefined.

(26) The controller 60 then (1010) generates manipulated variables u depending on the control strategy π(θ), as described in FIG. 1, using which variables the actuator 10 is controlled, as described in FIG. 1. The actuator 10 interacts with the sensor 30 via the surroundings 20, of which sensor the sensor signal S is directly or indirectly received as a control variable x by the controller 60.

(27) The block 100 receives and aggregates (1020) the time series of the manipulated variable u and control variable x which together in each case form a pair z comprising a control variable x and a manipulated variable x, z=(x.sup.1, . . . , x.sup.D, u.sup.1 . . . u.sup.F).sup.T.

(28) In this case D is the dimensionality of the control variable x, and F is the dimensionality of the manipulated variable u, i.e., x∈ custom character .sup.D, u∈.sup.F.

(29) Depending on this state trajectory, a Gaussian process g is then (1030) adapted such that between successive time points t, t+1 the following applies:
x.sub.t+1=x.sub.t+g(x.sub.t,u.sub.t). (1)
In this case
u.sub.t=π.sub.θ(x.sub.t). (1′)

(30) A covariance function k of the Gaussian process g is, for example, given by
k(z,w)=σ.sub.f.sup.2 exp(−½(z−w).sup.TΛ.sup.−1(z−w)). (2)

(31) The parameter σ.sub.f.sup.2 is in this case a signal variance, and Λ=diag(l.sub.1.sup.2 . . . l.sub.D+f.sup.2) is a collection of squared length scales l.sub.1.sup.2 . . . l.sub.D+F.sup.2 for each of the D+F input dimensions.

(32) A covariance matrix K is defined by
K(Z,Z).sub.i,j=k(z.sup.i,z.sup.j). (3)
The Gaussian process g is then characterized by two functions: by an average value p and a variance Var which are given by
μ(z.sub.*)=k(z.sub.*,Z)(K(Z,Z)+σ.sub.n.sup.2I).sup.−1y, (4)
Var(z.sub.*)=k(z.sub.*,z.sub.*)−k(z.sub.*,Z)(K(Z,Z)+σ.sub.n.sup.2I).sup.−1k(Z,z.sub.*). (5)
y is in this case given in the usual manner by y.sup.i=f(z.sup.i)+∈.sup.i, with white noise being ∈.sup.i.

(33) The parameters Λ, σ.sub.n, σ.sub.f are then adapted to the pair (z.sup.i,y.sup.i) in the known manner, by a logarithmic marginal likelihood-function being maximized.

(34) Support points ξ.sub.i and associated support weights w.sub.i are then (1040) determined (as described in FIG. 6 for example). The initial vector a.sub.0 which has N components is, for example, initialized to a randomly selected value and normalized to a length of 1.

(35) A new optimal parameter θ* is then (1050) determined (as described in FIG. 5 for example).

(36) The new optimal parameter θ* which is determined in this manner at least approximately solves the equation

(37) $\begin{matrix} θ^{*} = \max_{θ} R_{π}^{\infty} (θ) = \max_{θ} \int r (x) p_{*, θ} (x) dx . & (6) \end{matrix}$

(38) In this case p.sub.*,θ denotes a stationary probability distribution toward which the system (illustrated in FIG. 1) of continued use of the control strategy π.sub.θ converges, r(x) denotes a local cost function which can be defined by a characteristic diagram or a mathematic function, for example.

(39) The solution of equation (6) requires a solution to the equation
p.sub.*,θ(x.sub.t+1)=∫.sub.p(x.sub.t+1|x.sub.t,π.sub.θ(x.sub.t))p.sub.*,θ(x.sub.t)dx.sub.t. (7)

(40) Due to the form of the integral kernel, this equation cannot be solved in a closed form.

(41) The solution of this equation therefore has to be achieved by numeric approximation methods. This requires a sufficient precision be achieved without becoming computationally intensive. As a result, the method described in FIG. 5 corresponds to a numeric quadrature with the support points ξ.sub.i and associated support weights w.sub.i using
P.sub.*,θ(x.sub.t+1)≈Σ.sub.i=1.sup.Nw.sub.i.Math.p(x.sub.t+1|x.sub.t=ξ.sub.i,π(x.sub.t=ξ.sub.i))p.sub.*,θ(ξ.sub.i) (8)
and surprisingly achieves this aim.

(42) The parameter θ is then (1060) replaced with the new parameter θ*.

(43) It is then (1070) optionally checked whether the method of determining the parameter θ has converged. If this is not the case (“n”) there is a jump back to step 1010. If this is the case (“j”), however, optimal parameters θ have been found and the method is completed (1080). The method can naturally also be completed after a single iteration.

(44) FIG. 5 illustrates the method for determining the new optimal parameter θ* according to a possible embodiment.

(45) First (1500), basic functions ϕ.sub.i(x) are determined as ϕ.sub.i=g(ξ.sub.i, π.sub.θ(ξ.sub.i)) for each value of the index variables i=1 . . . N by means of the Gaussian process g and the support points ξ.sub.i. Matrix entries Φ.sub.i,j=ϕ.sub.j(ξ.sub.i) are then determined for all index variables i,j=1 . . . N. That is to say, the matrix entries Φ.sub.i,j together form a transition matrix (Φ), each matrix entry Φ.sub.i,j characterizing each probability given during Gaussian process g, such that the control variable x transitions from state x=ξ.sub.j into state x=ξ.sub.i.

(46) Now (1510) the matrix
M=diag(w)Φ (8)
is determined by
diag(w).sub.i,j=w.sub.iδ.sub.i,j.

(47) In this case the columns can also be normalized by replacing the entries of the matrix M.sub.i,j with M.sub.i,j/Σ.sub.k=1.sup.N w.sub.kϕ.sub.j(ξ.sub.i).

(48) Proceeding from the initial vector α.sub.0 the weight vectors α.sub.1, α.sub.2 . . . are then (1520) iteratively generated using
α.sub.t+1=Mα.sub.t, (9)
and are generated until the weight vectors which are generated in this manner converge, i.e., meet a predefinable convergence criteria, for example ∥α.sub.t+1−α.sub.t∥<ε∈ for a fixedly predefinable value ∈. The weight vector α.sub.t+1 which was last generated is the dominant eigenvector α.sub.θ.sup.∞ of the matrix M defined in the equation (8).

(49) It has been specifically recognized that the matrix M is positive and stochastic (“stochastic” in this context meaning that the elements of each line summate to the value one), and that according to the Perron-Frobenius theorem exactly one eigenvector exists for a largest possible eigenvalue λ=1, such that the described method (in terms of numerical precision) always clearly converges.

(50) The dominant eigenvector α.sub.θ.sup.∞ therefore characterizes by means of p.sub.*,θ(x)≈Φ.sup.Tα.sub.θ.sup.∞, i.e., the eigenvector characterizes the representation of the stationary probability distribution p.sub.*,θ by means of the basic functions ϕ.sub.i(x).

(51) As the dominant eigenvector of the positive matrix M, α.sub.θ.sup.∞ can be differentiated by means of the parameters θ which parameterize the matrix M. A partial derivative

(52) $\frac{\partial α_{θ}^{\infty}}{\partial θ}$
is therefore now (1540) estimated. This can, for example, be achieved by determining the corresponding dominant eigenvector α.sub.θ.sub.Δ.sup.∞ for various parameters θ.sub.Δ in the region of the parameter θ by means of steps (1500)-(1520), and estimating the partial derivative using differences, for example according to

(53) $\frac{\partial α_{θ}^{\infty}}{\partial θ} = \frac{α_{θ_{Δ}}^{\infty} - α_{θ}^{\infty}}{θ - θ_{Δ}}$

(54) The initial vector α.sub.0 can be set equal to the dominant eigenvector α.sub.θ.sup.∞ before step (1540), optionally in one step (1530), in order to improve convergence.

(55) A gradient ascent method is then (1550) preferably used in order to vary θ in the direction of the parameter θ in the direction of a maximum value of R.sub.π.sup.∞(θ) according to formula (6), using the determined estimated value of the partial derivative

(56) $\frac{\partial α_{θ}^{\infty}}{\partial θ}$
of the dominant eigenvector α.sub.θ.sup.∞. This is preferably carried out by means of the approximate equation
R.sub.π.sup.∞(θ)≈Σ.sub.i=1.sup.Nα.sub.i,θ.sup.∞ custom character .sub.x˜ϕ.sub.i[r(x)], (10)
where α.sub.i,θ.sup.∞ denotes the components of the dominant eigenvector α.sub.θ.sup.∞.

(57) It is then (1560) checked whether the method for determining the parameter θ has converged, for example, by checking whether the change of the parameter θ in step (1550) has fallen below a predefinable threshold value. If this is the case the method (1570) is completed. Otherwise, a new iteration begins in step (1500).

(58) FIG. 6 illustrates the method for determining the support points ξ.sub.i and the associated support weights w.sub.i.

(59) A partitioning of a state space X of all possible values of the control variable is first initialized. The partitioning can initially be selected as the trivial partitioning of the state space X, for example, i.e., the state space X is not divided, but is given by the whole state space X.

(60) A counter s is initialized to the value s=1. The support points ξ.sub.i are determined for the state space X according to a numerical quadrature rule (such as Kepler's barrel rule, the trapezoidal rule, Simpson's rule or the Gaussian quadrature, for example), so too are the associated support weights w.sub.i.

(61) It is then (2010) checked whether the counters has reached the maximum partitioning depth Lmax. If this is the case the method is completed (2100) in the step.

(62) Otherwise, the target value xd is assumed as value τ.sub.0′ for the control variable x, and a temporal evolution τ.sub.0′, τ.sub.1′ . . . τ.sub.T′ is determined (2020) using formula (1), (1′).

(63) A further value τ.sub.0 is then optionally also randomly selected for the control variable x according to the initial probability distribution p(x.sub.0), and a further temporal evolution τ.sub.0, τ.sub.1, . . . τ.sub.T is determined (2030) analogously to step 2020 using formula (1), (1′).

(64) A further counter l is then (2040) initialized to the value l=1, and it is checked (2050) whether the further counter l has reached the value of the counter s. If this is the case, step 2060 follows, in which the counter s is incremented by one, and there is a jump back to step 2010. If this is not the case, the variable ρ.sub.l(τ) is determined (2070) which characterizes whether the density of the support points ξ.sub.i is adequate. The density can be determined, for example, to

(65) $\begin{matrix} ρ_{l} (τ) = \frac{vol (X_{l})}{N_{l} \min_{τ_{l} τ_{l}^{'} \in X_{l}} Var (τ_{l})} . & (11) \end{matrix}$

(66) In this case X.sub.l is the I-th partial volume element of the partitioning of the state space X, Vol(X.sub.l) is the volume thereof and N.sub.l is the number of support points ξ.sub.i therein. It is then checked (2070) whether this variable is ρ.sub.l(τ)<1, other threshold values than the value “1” also being possible.

(67) If this is the case (“j”), a partial volume element X.sub.l is split (2080) into a plurality of smaller partial volume elements, for example, by the partial volume element X.sub.l being halved along one or along all of the dimensions thereof. The support points ξ.sub.i, which are associated with the partial volume element X.sub.l and associated support weights w.sub.i are then removed, and support points ξ.sub.i and associated support weights w.sub.i are added for each of the smaller, newly generated partial volume elements. Step 2090 then follows, in which the further counter l is incremented by one. There is then a jump back to step 2050.

(68) If the check in step 2070 reveals that the requirement has not been met (“n”), step 2090 follows immediately.

Method and device for setting at least one parameter of an actuator control system and actuator control system

Assignee

Inventors

Cpc classification

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G05B13/0265

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G05B13/042

PHYSICS

International classification

Classification Explorer

G05B13/04

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G06N7/00

PHYSICS

Classification Explorer

G05B13/02

PHYSICS

Abstract

Claims

Description