METHOD FOR ASCERTAINING A TIME CHARACTERISTIC OF A MEASURED VARIABLE, PREDICTION SYSTEM, ACTUATOR CONTROL SYSTEM, METHOD FOR TRAINING THE ACTUATOR CONTROL SYSTEM, TRAINING SYSTEM, COMPUTER PROGRAM, AND MACHINE-READABLE STORAGE MEDIUM
20210011447 · 2021-01-14
Inventors
- The Duy Nguyen-Tuong (Leonberg, DE)
- Christian Daniel (Leonberg, DE)
- Sebastian Trimpe (Tübingen, DE)
- Martin Schiegg (Korntal-Meunchingen, DE)
- Andreas Doerr (Stuttgart, DE)
Cpc classification
F02D41/28
MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
G06N7/01
PHYSICS
H10N30/802
ELECTRICITY
International classification
Abstract
A method for ascertaining a time characteristic of a measured variable adjustable by an actuator, wherein a time characteristic of a control variable is applied to the actuator, wherein the ascertaining is effected by means of a Gaussian process state model of the behavior of the actuator, wherein the time characteristic of the measured variable of the actuator is ascertained on the basis of a parameterizable family of functions, wherein in the parameterizable family of functions a time dependency of a later latent state, in particular ascertained using a transfer function, of the actuator on an earlier latent state of the actuator and an earlier control variable of the actuator is the same as the applicable dependency of the Gaussian process state model.
Claims
1-18. (canceled)
19. Method for ascertaining a time characteristic of a measured variable adjustable by an actuator, wherein a time characteristic of a control variable is applied to the actuator, wherein the ascertaining is effected by means of a Gaussian process state model of the behavior of the actuator, wherein the time characteristic of the measured variable of the actuator is ascertained on the basis of a parameterizable family of functions, wherein in the parameterizable family of functions a time dependency of a later latent state, in particular ascertained using a transfer function of the actuator on an earlier latent state of the actuator and an earlier control variable of the actuator is the same as the applicable dependency of the Gaussian process state model.
20. Method according to claim 19, wherein the parameterizable family of functions is set up to approximate an a-posteriori probability distribution of at least time characteristics of at least the latent state of the actuator and the transfer function as well as possible, given a time characteristic of the measured variable for an ascertained training data record.
21. Method according to claim 20, wherein the dependency of the parameterizable family of functions on an initial latent state of the actuator is given by a factor, which depends on this initial latent state, this factor being given by a parameterizable variation function, in particular by a normal distribution.
22. Method according to claim 19, wherein the Gaussian process state model is a sparse Gaussian process state model with inducing Gaussian process targets at pre-determinable pseudo input points.
23. Method according to claim 22, wherein the dependency of the parameterizable family of functions on Gaussian process targets is given by a second factor, where-in this second factor is a second parameterizable variation function, which has the respective Gaussian process target as an argument.
24. Method according to claim 23, wherein the second parameterizable variation function is given by a normal distribution function.
25. Method according to claim 24, wherein a predicted time trend of the latent state of the actuator is ascertained by recursively ascertaining a sample of the predicted time trend of the latent state at a later point in time from the parameterizable variation function of the predicted latent state at the later point in time given the predicted latent state at an earlier point in time, the time characteristic of the measured variable of the actuator being chosen on the basis of the predicted time trend of the latent state.
26. Method according to claim 25, wherein an initial latent state of the predicted time trend of the latent state is predetermined, in particular given randomly.
27. Method according to claim 25, wherein an initial latent state from the parameterizable variation function is ascertained by a distribution function of the initial state given the ascertained training data record, the characterizing parameters of which can be trained by back propagation.
28. Method according to claim 19, wherein an optimal control variable is ascertained on the basis of a characteristic of the measured variable ascertained by means of the method according to claim 19.
29. Method according to claim 28, wherein the actuator is controlled by means of the optimal control variable.
30. Method for ascertaining at least one optimal parameter which characterizes a control strategy of an actuator control system, which is set up to control an actuator with a control variable on the basis of this control strategy, wherein, when using the control strategy, the time characteristic of a measurement variable that is adjustable by the actuator is ascertained by means of the method according to claim 1, and on the basis of the characteristic of the measured variable thus ascertained, the at least one optimal parameter is ascertained.
31. Prediction system set up to carry out the method according to claim 19.
32. Actuator control system which is set up to control an actuator by means of the method according to claim 29.
33. Method for training the actuator control system according to claim 32, wherein parameters of the parameterizable families of functions and/or deterministic parameters are adapted such that they approximate an a-posteriori probability distribution of at least time characteristics of at least the latent state of the actuator and the transfer function as well as possible, given a time characteristic of the measured variable for an ascertained training data record.
Description
[0045] Hereinafter, embodiments of the invention will be explained in more detail with reference to the accompanying drawings. In the drawings:
[0046]
[0047]
[0048]
[0049]
[0050]
DESCRIPTION OF THE EXEMPLARY EMBODIMENTS
[0051]
[0052] The actuator 10 can be, for example, a (partially) autonomous robot, for example a (partially) autonomous motor vehicle, or a robot that combats targeted weeds in a field, for example tears them out or sprays them with applicable chemicals.
[0053] The sensor 30 may be, for example, one or a plurality of video sensors and/or one or a plurality of radar sensors and/or one or a plurality of ultrasonic sensors and/or one or a plurality of position sensors (for example GPS). Alternatively or additionally, the sensor 30 can also include an information system that ascertains information about a state of the actuator system, such as a weather information system that determines a current or future state of the weather in the environment 20.
[0054] In another exemplary embodiment, the actuator 10 may be a manufacturing robot, and the sensor 30 may then be, for example, an optical sensor that detects characteristics of manufacturing products of the manufacturing robot.
[0055] In a further exemplary embodiment, the actuator 10 can be a release system which is set up to enable or not to enable the activity of a device. The sensor 30 can be, for example, an optical sensor (for example for recording image or video data), which is set up to detect a face. Depending on the sequence of control signals A, the actuator 10 ascertains an enable signal that can be used to enable the device on the basis of the value of the enable signal. The device can, for example, be a physical or logical access control. Depending on the value of the control signal A, the access control can then provide that access is granted or not.
[0056] In a further exemplary embodiment, the actuator 10 can be part of a building control system, for example a controller of a heating system.
[0057] The actuator control system 40 receives the sequence of sensor signals S from the sensor in an optional receiving unit 50, which converts the sequence of sensor signals S into a sequence of measured variables y (alternatively, the sensor signal S can also be directly adopted as the measured variable y). The measured variable y can be, for example, a section or further processing of the sensor signal S. The measured variable y is fed to a machine learning system 60, the functioning of which is explained in more detail below in connection with
[0058] The machine learning system 60 ascertains a control variable u from the measured variables y. This ascertainment is made on the basis of parameters , which are stored in a parameter memory P. These parameters can in particular include parameters .sub.opt which characterize a control strategy of the actuator control system 40. The parameter memory P can be integrated in the actuator control system 40, but it can also be spatially separate from the actuator control system 40, and can be connected to the actuator control system 40, for example, via a network connection. The control variable u is fed to an optional forming unit 80, which ascertains therefrom control signals A which are fed to the actuator 10.
[0059] In further embodiments, the actuator control system 40 comprises the actuator 10.
[0060] In further preferred embodiments, the actuator control system 40 comprises a single or a plurality of processors 45 and at least one machine-readable storage medium 46 having stored thereon instructions which, when executed on the processors 45, cause the actuator control system 40 to execute the method for controlling the actuator 10.
[0061]
[0062] A measurement apparatus 150 ascertains a training data record y.sub.mess, which comprises both control variables u and associated measured variables y. These can be ascertained, for example, by actuating the actuator 10 by means of the control variables u and ascertainment of the resulting measured variables y, and can be stored on a data carrier (not shown), which can be part of the measurement apparatus 150. For the ascertainment of the training data record y.sub.mess, the measurement apparatus 150 can read out from the data carrier.
[0063] The training data record y.sub.mess is fed to a training block 190 which, on the basis of the parameters stored in the parameter memory P, ascertains optimized parameters by means of the method illustrated in
[0064] Alternatively or additionally, by means of the method illustrated in
[0065] In other preferred embodiments, the training system 140 comprises one or a plurality of processors 200 and at least one machine-readable storage medium 210 having stored thereon instructions which, when executed on the processors 200, cause the training system 140 to carry out the method for training the machine learning system 60.
[0066]
[0067] Subsequently (1100), these time characteristics are optionally divided into sub-characteristics of predeterminable length T.sub.sub.
[0068] Thereafter, for the characteristic or one or more of the sub-characteristics, a plurality of sub-characteristics, in each case one or a plurality of associated trajectories of predicted latent variables {tilde over (x)} is ascertained. For this purpose, an initial predicted latent state {tilde over (x)}.sub.1 is first ascertained, for example, drawn from the parameterized distribution function q(x.sub.1). The parameters of this distribution function are then preferably also part of the parameters to be optimized, since any errors which are caused by the initial latent state may not decay sufficiently quickly, particularly in the case of short time characteristics. Thereafter, depending on the length of the time characteristic, a recursive ascertainment of the further predicted latent states {tilde over (x)}.sub.t takes place.
[0069] Subsequently, samples {tilde over (x)}.sub.t are taken from the distribution function q(x.sub.t). For this purpose, samples (0.1) are taken, for example, and then are taken for all d and all points in time t>1
{tilde over (x)}.sub.t+1,d=.sub.d+{square root over (.sub.d.sup.2({circumflex over (x)}.sub.t,{tilde over (x)}.sub.t)+.sub.x,d.sup.2)}. (11)
[0070] Herein {tilde over (x)}.sub.t=({tilde over (x)}.sub.t,u.sub.t).
[0071] Thereafter, the parameters should be adjusted in such a way that the Kullback-Leibler divergence KL(q(x.sub.1:T,f.sub.2:T,z)p(x.sub.1:T,f.sub.2:T,z|y.sub.1:T)) is minimized, the length T being naturally replaced by T.sub.sub in the case of subdivision into sub-characteristics. With the usual lower evidence lower bound (in short: ELBO), minimizing this KL divergence is equivalent to maximizing the ELBO that is given by
[0072] Therefore (1200), the ELBO is now estimated according to equation (13). For this purpose, the first term on the right-hand side from equation (13) is estimated using the predicted time characteristics of the latent variable x by means of
[0073] wherein N designates the predicted time characteristics of the latent variable x generated in step 1100.
[0074] On the basis of this stochastic ascertainment of the ELBO, gradients of the function .sub.GP-SSM are ascertained, and a stochastic gradient increase of the parameters is carried out in order to determine new parameters (1300).
[0075] Now (1400), it is checked whether a convergence criterion is satisfied. If this is the case (1500), the new parameters replace the parameters stored in the parameter memory P , and the method ends. Otherwise, the process branches back to step 1150.
[0076]
[0077] For this purpose, a time characteristic of the control variable u is first generated (2010). Subsequently, (2020) the initial latent state {tilde over (x)}.sub.1 is ascertained, for example chosen randomly or chosen equal to 0. This is possible because, for stable transient dynamics, transient effects caused by an incorrectly selected initial latent state {tilde over (x)}.sub.1 decay exponentially. The latent state {tilde over (x)}.sub.1:T.sub.
[0078] Thereafter (2030), a cost function is determined depending on the ascertained characteristic of the measured variable y.sub.1:T.sub.
[0079] Subsequently (2040), it is checked whether a convergence criterion of the cost function has been reached. If this is the case (2050), the currently ascertained characteristic of the control variable u is adopted as the optimal control variable u.sub.opt, and the actuator 10 is controlled according to the characteristic of the optimal control variable u.sub.opt.
[0080] If this is not the case (2060), the characteristic of the control variable u is varied. For example, a gradient descent method can be used, the gradients being able to be ascertained numerically, for example, with evaluation steps analogous to step (2020), or being able to also be predetermined analytically. Subsequently, with a changed characteristic of the control variable u, the process branches back to step 2020.
[0081]
[0082] Subsequently (3010), an initial value of the control variable u and an initial value of the parameter .sub.opt are generated. An initial value of the latent state x is also ascertained analogously to step (2020). Subsequently (3020), by means of equations (5) and (11) and the current control strategy characterized by the parameter .sub.opt, a time characteristic of the latent state u, the measured variable y, and the control variable u is ascertained. Thereafter, a cost function is ascertained (4030) depending on the ascertained characteristic of the measured variable.
[0083] Subsequently (3040), it is checked whether a convergence criterion of the cost function has been reached. If this is the case (3050), the currently ascertained parameter .sub.opt is adopted as the optimal parameter .sub.opt.
[0084] If this is not the case (3060), the parameter .sub.opt is varied. For example, a gradient descent method can be used. Subsequently, with a changed characteristic of the parameter .sub.opt, the process branches back to step 3020.
[0085] Of course, all methods cannot only be implemented in software, but also in hardware, or in a mixed form of hardware and software.