Method for the safe training of a dynamic model

Abstract

A computer-implemented method for the safe, active training of a computer-aided model for modeling time series of a physical system using Gaussian processes, including the steps of establishing a safety threshold value α; initializing by implementing safe initial curves as input values on the system, creating an initial regression model and an initial safety model; repeatedly carrying out the steps of updating the regression model; updating the safety model; determining a new curve section; implementing the determined new curve section on the physical system and measuring output variables; incorporating the new output values in the regression model and the safety model until N passes have been carried out; and outputting the regression model and the safety model.

Claims

1. A computer-implemented method for the safe, active training of a time series-based model of a physical system using Gaussian processes, the method comprising: establishing a safety threshold value; initializing by implementing safe initial curves as input values on the system, creating an initial regression model and an initial safety model; repeatedly carrying out for a plurality of passes, the following steps: updating the regression model; updating the safety model; determining a new curve section; implementing a determined new curve on the physical system and measuring new output values; and incorporating the new output values in the regression model and the safety model; and updating and outputting the regression model and the safety model.

2. The method as recited in claim 1, wherein the new curve section is determined in such a way that information gain is maximized while meeting safety criteria of the safety model.

3. The method as recited in claim 1, wherein the new curve section is determined while using a covariance matrix.

4. The method as recited in claim 1, wherein the new curve section is determined via optimization under a secondary condition.

5. The method as recited in claim 1, wherein the system is a test bench for internal combustion engines, a robot controller, a physical sensor, or a chemical reaction.

6. The method as recited in claim 1, wherein the safety model includes values of the system, the values of the system including: pressure values, and/or exhaust values, and/or consumption values, and/or power values, and/or joint position values, and/or movement limits, and/or sensor values, and/or temperature values, and/or acidity values.

7. A non-transitory computer-readable memory medium on which is stored a computer-aided model of a physical system which is trained using Gaussian processes, the model being trained by: establishing a safety threshold value; initializing by implementing safe initial curves as input values on the system, creating an initial regression model and an initial safety model; repeatedly carrying out, for a plurality of passes, the following steps: updating the regression model; updating the safety model; determining a new curve section; implementing a determined new curve on the physical system and measuring new output values; and incorporating the new output values in the regression model and the safety model; and updating and outputting the regression model and the safety model.

8. A non-transitory machine-readable memory medium on which is stored a computer program for active training of a time series-based model of a physical system using Gaussian processes, the computer program, when executed by a computer, causing the computer to perform: establishing a safety threshold value; initializing by implementing safe initial curves as input values on the system, creating an initial regression model and an initial safety model; repeatedly carrying out, for a plurality of passes, the following steps: updating the regression model; updating the safety model; determining a new curve section; implementing a determined new curve on the physical system and measuring new output values; and incorporating the new output values in the regression model and the safety model; and updating and outputting the regression model and the safety model.

9. A device configured to actively train a time series-based model of a physical system using Gaussian processes, the device comprising a processor configured to: establish a safety threshold value; initialize by implementing safe initial curves as input values on the system, creating an initial regression model and an initial safety model; repeatedly carry out, for a plurality of passes, the following: updating the regression model; updating the safety model; determining a new curve section; implementing a determined new curve on the physical system and measuring new output values; and incorporating the new output values in the regression model and the safety model; and update and output the regression model and the safety model.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Exemplary embodiments of the present invention are shown in the figures and are explained in greater detail below.

(2) FIG. 1 shows the sequence 100 of the method for the safe training of a computer-aided model.

(3) FIG. 2 shows the sequence 200 of the method for the safe training of a computer-aided model.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

(4) The approximation of an unknown function ƒ: X⊂ custom character .sup.d.fwdarw.Y⊂ is to be achieved. In the case of time series models such as, for example, the well-known non-linear exogenous (NX) model, the input area is made up of discrete values, the so-called manipulated variables.

(5) With x.sub.k for the point in time k, (u.sub.k, u.sub.k−1, . . . , u.sub.k−{tilde over (d)}+1) is applicable, (u.sub.k).sub.k, u.sub.k∈π⊂ custom character .sup.d representing the discretized curve. In this case, d is the dimension of input area, π of the system, {tilde over (d)} the dimension of the NX structure and d=d.Math.{tilde over (d)} the dimension of X.

(6) The elements u.sub.k are measured by the physical system and need not be equidistant. For reasons of simpler notation, an equidistance is assumed by way of example. In general, the control curves are continuous signals and may be explicitly controlled.

(7) Data in the form of n successive curve sections D.sub.n.sup.f={τ.sub.i, ρ.sub.i}.sub.i=1.sup.2 are observed in the learning environment of the model, the input curve τ.sub.i being a matrix and being made up of m input points of dimension d, i.e., τ.sub.i=(x.sub.1.sup.i, . . . , x.sub.m.sup.i)∈ custom character .sup.d×m. Output curve ρ.sub.i includes m corresponding output measurements, i.e., ρ.sub.i=(y.sub.1.sup.i, . . . , y.sub.m.sup.i)∈.sup.m

(8) The next curve section τ.sub.n+1 to be input as stimulation into the physical system is to be determined in such a way that the information gain D.sub.n+1.sup.f with respect to the modelling of f is increased, taking safety conditions into consideration, however.

(9) As an approximation of the function f, a Gaussian process (hereinafter abbreviated as GP) is used, which is established by its mean value function μ(x) and its covariance function k(x.sub.i, x.sub.j), i.e., ƒ(x.sub.i)˜GP(μ(x.sub.i), k(x.sub.i, x.sub.j)).

(10) Assuming noisy observations of the input and output curves, the shared distribution according to the Gaussian process is given as p(P.sub.n|T.sub.n)=N(P.sub.n|0, K.sub.n+σ.sup.2I), P.sub.nϵ custom character .sup.n.Math.m being a vector, which connects output curves, and T.sub.nϵ.sup.n.Math.m×d being a matrix containing input curves. The covariance matrix is represented by K.sub.nϵ.sup.n.Math.m×n.Math.m. As an illustration, a Gaussian core is used as a covariance function, i.e., k(x.sub.i, x.sub.i)=σ.sub.f.sup.2 exp(−½(x.sub.i−x.sub.j).sup.TΛ.sub.f.sup.2(x.sub.i−x.sub.j)), which is parameterized by θ.sub.f=(σ.sub.f.sup.2, Λ.sub.f.sup.2). A zero vector 0∈ custom character .sup.n.Math.m is also assumed as a mean value, a nm-dimensional identity matrix I and σ.sup.2 as an output noise variance.

(11) Under the given shared distribution, the forecasted distribution p(ρ*|τ*, D.sub.n.sup.f) may be expressed for a new curve section τ* as
p(ρ*|τ*,D.sub.n.sup.f)=N(ρ*|μ(τ*),Σ(τ*)), [equation A:]
μ(τ*)=k(τ*,T.sub.n).sup.T(K.sub.n+σ.sup.2I).sup.−1P.sub.n, [equation B:]being
Σ(τ*)=k**(τ*,τ*)−k(τ*,T.sub.n).sup.T(K.sub.n+σ.sup.2I).sup.−1k(τ*,T.sub.n) [equation C:]being

(12) k**ϵ custom character .sup.m×m being a matrix with k.sub.ij**=(x.sub.i, x.sub.j). Matrix k**ϵ.sup.m×n.Math.m further contains core evaluations with respect to τ* of the previous n input curves. Since the covariance matrix is completely filled, input points x correlate both completely with a curve section as well as beyond various curves utilizing the correlations for planning the next curve. Since matrix K.sub.n+σ.sup.2I potentially has a high dimensional number n.Math.m, its inversion may be time-consuming, so that GP approximation techniques may be used.

(13) The safety status of the system is described by an unknown function g, with g:X⊂ custom character .sup.d.fwdarw.Z⊂, which assigns to each input point x a safety value z, which serves as a safety indicator. Values z are determined using pieces of information from the system, and are configured in such a way that for all values of z that are greater than or equal to zero, corresponding input point x is considered as safe.

(14) Such safety values z are a function of the respective system and may, as explained above, embody system-dependent values for safe or unsafe pressure values, exhaust values, consumption values, power values, joint position values, movement limits, sensor values, temperature values, acidity values or the like.

(15) The values of z are generally continuous and indicate the distance of a given point x from the unknown safety limit in the input area. Thus, the safety level for a curve τ may be ascertained with the given function g or with an estimation thereof. A curve is classified as safe if the probability that its safety value z is greater than zero is sufficiently great, i.e., ∫.sub.z.sub.1.sub., . . . z.sub.m.sub.≥0p(z.sub.1, . . . , z.sub.m|τ)dz.sub.1, . . . , z.sub.m>1−α with α∈[0,1] representing the threshold value for the fact that τ is unsafe. With the given data D.sub.n.sup.g={τ.sub.i, ζ.sub.i}.sub.i=1.sup.n, with ζ.sub.i=(z.sub.1.sup.i, . . . , z.sub.m.sup.i)∈ custom character .sup.m a GP may be used in order to approximate the function g. The forecast distribution p(ζ*|τ*, D.sub.n.sup.g) for a given curve section τ* is calculated as
p(ζ*|τ*,D.sub.n.sup.g)=N(ζ*|μ.sub.g(τ*),Σ.sub.g(τ*)) [equation D:]

(16) μ.sub.g(τ*) and Σ.sub.g(τ*) being the corresponding mean value and covariance values. The variables μ.sub.g and Σ.sub.g are calculated as shown in equation 2 and 3, however, with Z.sub.nϵ custom character .sup.n.Math.m as the target vector, which connects all ζ.sub.i. By using a GP for approximating g, safety condition ξ(τ) may be calculated for a curve τ as follows
ξ(τ)=∫.sub.z1, . . . z.sub.m.sub.≥0N(ζ|μ.sub.g(τ),Σ.sub.g(τ))dz.sub.1, . . . ,z.sub.m>1−α. [equation E:]

(17) The calculation of ξ(τ) is generally difficult to solve analytically, and thus a certain approximation may be used such as, for example, a Monte-Carlo simulation or expectation value propagation (“expectation propagation”).

(18) For the efficient selection of an optimal τ, the curve must be parameterized in a suitable manner. One possibility is to perform the parameterization already in the input area. The parameterization of the curve may be implemented, for example, as ramp functions or step functions.

(19) For a curve parameterization with a forecast distribution according to equation A and safety conditions according to equation E, the next curve section τ.sub.n+1(η*) may be obtained by solving the following optimization problem with secondary conditions:
η*=arg max.sub.η∈πJ(Σ(η)) [equation F:]
so that ξ(η)>1−α, [equation G:]

(20) ηϵπ representing the curve parameterization and J an optimality criterion.

(21) According to equation F, predictive variance Σ from equation A is used for the exploration. This is a covariance matrix, which is mapped on a real number by optimality criterion J, as shown in equation F. Different optimality criteria may be used for J as a function of the system. Thus, J may, for example, be the determinant, i.e., equivalent to maximizing the volume of the forecast reliance ellipsoid of the multi-normal distribution, the trace, i.e., equivalent to maximizing the average forecast variance, or the maximum intrinsic value, i.e., equivalent to maximizing the largest axis of the forecast reliance ellipsoid. However, other optimality criteria are also conceivable.

(22) Referring to FIG. 1, an initialization is carried out in step 120 by implementing n.sub.0 safe initial curves. A regression and safety process, Gaussian processes are also created in the process. The initial curves are located in a small safe area in which the exploration begins. This small safe area is selected in advance as a result of prior knowledge about the system. The initial curves are determined to D.sub.0.sup.f,g={τ.sub.i, ρ.sub.i,ζ.sub.i}.sub.i=1.sup.n with n=n.sub.0.

(23) A new curve section τ.sub.m+1 is subsequently determined in step 160 according to equations F and G by optimizing η.

(24) Determined curve section τ.sub.n+1 is subsequently used as input in step 170 and measured in this area ρ.sub.n+1 and ζ.sub.n+1 on the physical system.

(25) The regression and safety processes are then updated in step 150. Regression model f is updated according to equation A using D.sub.n.sup.f={τ.sub.i, ρ.sub.i}.sub.i=1.sup.n, and safety model g is updated according to equation D using D.sub.n.sup.g={τ.sub.i, ζ.sub.i}.sub.i=1.sup.n.

(26) The steps 150 through 170 in this case are passed through N-times. In addition to a previously established number of passes, an automatic ending after reaching a termination condition is also possible. This could be based, for example, on training errors (error metric in model prediction and system response) or on an additional potential information gain (if the optimality criterion becomes too small).

(27) Subsequently, the regression model and the safety model are output in step 190.

(28) Referring to FIG. 2, an implementation 200 of the method is explained. In step 210, a safety threshold value is established. In the process, a value between 0 and 1 is selected for α. An initialization is then carried out in step 220 by implementing n.sub.0 safe initial curves, D.sub.0.sup.f,g={τ.sub.i, ρ.sub.i,ζ.sub.i}.sub.i=1.sup.n, with n=n.sub.0. In this way, regression and safety processes (Gaussian processes) are also created. The initial curves are located in a small safe area in which the exploration begins. This small safe area is selected in advance as a result of prior knowledge about the system.

(29) The part of the method encompassing steps 240 through 280 is carried out N times, k being the control variable, i.e., indicating the instantaneous pass. As in FIG. 1, an automatic ending after reaching a termination condition is conceivable, in addition to a previously established number of passes. This could, for example, be based on training errors (error metric in model prediction and system response) or on an additional potential information gain (if the optimality criterion becomes too small).

(30) Regression model f according to equation A is first updated in step 240 using D.sub.k.sup.f={τ.sub.i, ρ.sub.i}.sub.i=1.sup.n. In step 250, safety model g according to equation D is updated using D.sub.k.sup.g={τ.sub.i, ζ.sub.i}.sub.i=1.sup.n. In the first pass of steps 240 through 280, steps 240 and 250 may be omitted.

(31) A new curve section τ.sub.n+1 is subsequently determined in step 260 according to equations F and G by optimizing η.

(32) Determined curve section τ.sub.n+1 is subsequently used as input in step 270 and measured in this area ρ.sub.n+1 and ζ.sub.n+1 on the physical system.

(33) The input and output curves to D.sub.k−1.sup.f and D.sub.k−1.sup.g processed in the preceding steps are then added in step 280.

(34) After the repetitions of steps 240 through 280 are completed, step 290 follows, in which the regression and safety model are updated and output.

(35) The incremental updating of the GP model for new data, i.e., steps 150, respectively 240 and 250, may be efficiently carried out, for example, by updating the range of the matrix (rank-one update). A NX structure is shown by way of example here in combination with the GP model for time series modelling, however, the general, non-linear auto-regressive exogenous case may also be used, i.e., GP with NARX input structure, where x.sub.k=(y.sub.k, y.sub.k−1, . . . , y.sub.k−q,u.sub.k,u.sub.k−1, . . . , u.sub.k−d). In this case, the forecast mean value of p(ρ|τ, D.sub.n.sup.f), for example, may be used as a substitute for y.sub.k for optimizing and for planning for the next curve section. The input stimulation of the system is nevertheless carried out via manipulation variable u.sub.k in the case of NARX.

Method for the safe training of a dynamic model

Assignee

Inventors

Cpc classification

Classification Explorer

G06N20/10

PHYSICS

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06F18/217

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06N5/01

PHYSICS

Classification Explorer

G06F18/256

PHYSICS

International classification

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06V10/75

PHYSICS

Abstract

Claims

Description