Method and device for providing a sparse Gaussian process model for calculation in an engine control unit
09934197 ยท 2018-04-03
Assignee
Inventors
- Ernst Kloppenburg (Ditzingen, DE)
- Michael Hanselmann (Korntal, DE)
- Heiner Markert (Stuttgart, DE)
- Felix Streichert (Yokohama, JP)
Cpc classification
G06F17/16
PHYSICS
International classification
Abstract
A method for determining a sparse Gaussian process model to be carried out in a solely hardware-based model calculation unit includes: providing supporting point data points, a parameter vector based thereon, and corresponding hyperparameters; determining or providing virtual supporting point data points for the sparse Gaussian process model; and determining a parameter vector Q.sub.y* for the sparse Gaussian process model with the aid of a Cholesky decomposition of a covariant matrix K.sub.M between the virtual supporting point data points and as a function of the supporting point data points, the parameter vector based thereon, and the corresponding hyperparameters, which define the sparse Gaussian process model.
Claims
1. A method for determining a sparse Gaussian process model, comprising: providing an internal combustion engine in a test stand; recording supporting point data points from the test stand into a storage unit, the supporting point data points describing a behavior of the internal combustion engine; performing, by a hardware-based model calculation unit, the following: providing the supporting point data points, a parameter vector based on the supporting data points, and corresponding hyperparameters; determining or providing virtual supporting point data points for the sparse Gaussian process model, wherein the virtual supporting point data points are artificially generated supporting point data points; determining a parameter vector Q.sub.y* for the sparse Gaussian process model by performing a Cholesky decomposition of a covariant matrix K.sub.M between the virtual supporting point data points and as a function of the supporting point data points, the parameter vector based thereon, and the corresponding hyperparameters; ascertaining a covariant matrix K.sub.N, the covariant matrix K.sub.M, and a covariant matrix K.sub.MN, wherein K.sub.N is ascertained by determining a covariance of between the supporting point data points, wherein K.sub.M is ascertained by determining a covariance between the virtual supporting point data points, and wherein K.sub.MN is ascertained by determining a covariance between the supporting point data points and the virtual supporting point data points; determining a diagonal matrix from K.sub.MN.sup.TK.sub.M.sup.1K.sub.MN, using the Cholesky decomposition of the covariant matrix K.sub.M between the virtual supporting point data points; determining the parameter vector Q.sub.y* based on the hyperparameters for the sparse Gaussian process model based on the diagonal matrix; determining an intermediate variable Q.sub.M=K.sub.M+K.sub.MN(+.sub.n.sup.2I).sup.1K.sub.MN.sup.T from the diagonal matrix while using a Cholesky decomposition of the covariant matrix K.sub.MN between the conventional and the virtual supporting point data points; and determining the parameter vector Q.sub.y* based on the hyperparameters for the sparse Gaussian process model based on the intermediate variable Q.sub.M, wherein the sparse Gaussian process model is determined based on the parameter vector Q.sub.y*; storing the virtual supporting point data points and the hyperparameters for the sparse Gaussian process model on a control unit of a further internal combustion engine; and operating the further internal combustion engine, by the control unit, using the sparse Gaussian process model, the virtual supporting point data points, and the hyperparameters for the sparse Gaussian process model.
2. The method of claim 1, wherein the vector Q.sub.y* for the sparse Gaussian process model is ascertained as Q.sub.y*=L.sub.m.sup.TL.sub.m.sup.1+K.sub.MN(+.sub.n.sup.2I).sup.1Y, L.sub.M corresponding to the Cholesky decomposition of intermediate variable Q.sub.M.
3. The method of claim 1, wherein a jitter is applied to the hyperparameter vector Q.sub.M for the sparse Gaussian process model.
4. A non-transitory computer readable medium having a computer program, which is executable by a processor, comprising: a program code arrangement having program code for determining a sparse Gaussian process model, which is performed in a hardware-based model calculation unit, by performing the following: reading supporting point data points from a storage unit, the supporting point data points describing a behavior of an internal combustion engine in a test stand, the test stand providing the supporting point data points to the storage unit; providing the supporting point data points, a parameter vector based on the supporting point data points, and corresponding hyperparameters; determining or providing virtual supporting point data points for the sparse Gaussian process model, wherein the virtual supporting point data points are artificially generated supporting data points; determining a parameter vector Q.sub.y* for the sparse Gaussian process model by performing a Cholesky decomposition of a covariant matrix K.sub.M between the virtual supporting point data points and as a function of the supporting point data points, the parameter vector based on the supporting point data points, and the corresponding hyperparameters; ascertaining a covariant matrix K.sub.N, the covariant matrix K.sub.M, and a covariant matrix K.sub.MN, wherein K.sub.N is ascertained by determining a covariance of between the supporting point data points, wherein K.sub.M is ascertained by determining a covariance between the virtual supporting point data points, and wherein K.sub.MN is ascertained by determining a covariance between the supporting point data points and the virtual supporting point data points; determining a diagonal matrix from K.sub.MN.sup.TK.sub.M.sup.1K.sub.MN, using the Cholesky decomposition of the covariant matrix K.sub.M between the virtual supporting point data points; and determining the parameter vector Q.sub.y* based on the hyperparameters for the sparse Gaussian process model based on the diagonal matrix; determining an intermediate variable Q.sub.M=K.sub.M+K.sub.MN(+.sub.n.sup.2I).sup.1K.sub.MN.sup.T from the diagonal matrix while using a Cholesky decomposition of the covariant matrix K.sub.MN between the conventional and the virtual supporting point data points; and determining the parameter vector Q.sub.y* based on the hyperparameters for the sparse Gaussian process model based on the intermediate variable Q.sub.M Q.sub.M, wherein the sparse Gaussian process model is determined based on the parameter vector Q.sub.y*; storing the virtual supporting point data points and the hyperparameters for the sparse Gaussian process model on a control unit of a further internal combustion engine; and operating the further internal combustion engine, by the control unit, using the sparse Gaussian process model, the virtual supporting point data points, and the hyperparameters for the sparse Gaussian process model.
5. The non-transitory computer-readable medium as recited in claim 4, wherein the vector Q.sub.y* for the sparse Gaussian process model is ascertained as Q.sub.y*=L.sub.m.sup.TL.sub.m.sup.1+K.sub.MN(+.sub.n.sup.2I).sup.1Y, L.sub.M corresponding to the Cholesky decomposition of intermediate variable Q.sub.M.
6. The non-transitory computer-readable medium as recited in claim 4, wherein a jitter is applied to the hyperparameter vector Q.sub.M for the sparse Gaussian process model.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
DETAILED DESCRIPTION
(3)
(4) The use of nonparametric, data-based function models is based on a Bayesian regression method. The fundamentals of Bayesian regression are described, for example, in C. E. Rasmussen et al., Gaussian Processes for Machine Learning, MIT Press 2006. Bayesian regression is a data-based method which is based on a model. To prepare the model, measuring points of training data and associated output data of an output variable to be modeled are required. The preparation of the model is carried out based on the use of supporting point data, which entirely or partially correspond to the training data or are generated therefrom. Furthermore, abstract hyperparameters are determined, which parameterize the space of the model functions and effectively weight the influence of the individual measuring points of the training data on the later model prediction.
(5) The abstract hyperparameters are determined by an optimization method. One possibility for such an optimization method is an optimization of a marginal likelihood p(Y|H, X). Marginal likelihood p(Y|H, X) describes the plausibility of model parameters H, given the measured y values of the training data, represented as vector Y and the x values of the training data, represented as matrix X. In model training, p(Y|H, X) is maximized by searching for suitable hyperparameters which result in a curve of the model function determined by the hyperparameters and the training data and which image the training data as precisely as possible. To simplify the calculation, the logarithm of p(Y|H, X) is maximized, since the logarithm does not change the consistency of the plausibility function.
(6) The calculation of the Gaussian process model takes place according to the calculation specification below. Input values {tilde over (x)}.sub.d for a test point x (input variable vector) are first scaled and centered, specifically according to the following formula:
(7)
(8) In this formula, m.sub.x corresponds to the mean value function with respect to a mean value of the input values of the supporting point data, s.sub.x corresponds to the variance of the input values of the supporting point data, and d corresponds to the index for dimension D of test point x.
(9) The following equation is obtained as the result of the preparation of the nonparametric, data-based function model:
(10)
(11) Model value v thus ascertained is scaled with the aid of an output scaling, specifically according to the following formula:
{tilde over (v)}=vs.sub.y+m.sub.y.
(12) In this formula, v corresponds to a scaled model value (output value) at a scaled test point x (input variable vector of dimension D), {tilde over (v)} corresponds to a (non-scaled) model value (output value) at a (non-scaled) test point (input variable vector of dimension D), x.sub.i corresponds to a supporting point of the supporting point data, N corresponds to the number of the supporting points of the supporting point data, D corresponds to the dimension of the input data/training data/supporting point data space, and I.sub.d and .sub.f correspond to the hyperparameters from the model training, namely the length scale and the amplitude factor. Vector Q.sub.y is a variable calculated from the hyperparameters and the training data. Furthermore, m.sub.y corresponds to the mean value function with respect to a mean value of the output values of the supporting point data and s.sub.y corresponds to the variance of the output values of the supporting point data.
(13) Modeling system 2 furthermore carries out a method for processing the ascertained or provided training data, to provide the data-based function model with the aid of hyperparameters and supporting point data, which represent a subset of the training data. In this way, a so-called sparse Gaussian process model is prepared.
(14) These supporting point data and hyperparameters are transferred into a control unit 4 and stored therein. Control unit 4 is connected to a physical system 3, for example, an internal combustion engine, which is operated with the aid of the data-based function model.
(15)
(16) Main computing unit 42, which is provided as a microcontroller, is configured to calculate function values of the provided data-based function model with the aid of a software-determined algorithm. To accelerate the calculation and to relieve microcontroller 42, it is provided that model calculation unit 43 is used. Model calculation unit 43 is completely implemented in hardware and is capable only of carrying out a certain calculation specification, which is essentially based on repeated calculations of an addition function, a multiplication function, and an exponential function. Fundamentally, model calculation unit 43 is thus essentially hardwired and is accordingly not configured to execute a software code, as in the case of main computing unit 42.
(17) Alternatively, an approach is possible in which model calculation unit 43 provides a restricted, highly specialized command set for calculating the data-based function model. However, a processor is not provided in model calculation unit 43 in any specific embodiment. This enables resource-optimized implementation of such a model calculation unit 43 or an area-optimized setting in an integrated construction.
(18) In such a control unit 4, in addition to conventional Gaussian process models, sparse Gaussian process models may also be calculated. Since, in the case of sparse Gaussian process models, the quantity of supporting point data is significantly less than in conventional Gaussian process models, the storage capacity to be provided of storage unit 41 for storing the supporting point data may be reduced or multiple data sets of training data of multiple sparse Gaussian process models may be stored in storage unit 41.
(19) A conventional Gaussian process regression uses the given supporting point data points/training data points for calculating the covariant matrix. The model prediction is obtained in the form
(20)
where k.sub.x.sup.T, Q.sub.yR.sup.N applies. It is to be emphasized that k.sub.x.sup.T represents the covariant vector between query point x and the supporting point data points. This is calculated by the squared exponential core as
(21)
(22) In the case of sparse Gaussian process models, the essential idea is to replace the given supporting point data, which are formed by the real supporting point data points, with virtual, i.e., artificially generated supporting point data points. M artificial points are generated and suitably positioned by an optimizer in such a way that the model prediction of a sparse Gaussian process model using the virtual supporting point data points corresponds as exactly as possible to that of the Gaussian process model using the original supporting point data points. By integrating out the artificial y data, it is only necessary to optimize M virtual X positions
(23) The model prediction for the sparse Gaussian process model results as
y=k.sub.*.sup.TQ.sub.M.sup.1K.sub.MN(.sub.n.sup.2I).sup.1Y,
where k.sub.*.sup.TR.sup.M, Q.sub.MR.sup.MM, K.sub.MNR.sup.MN, is an N-dimensional diagonal matrix, and Y is the vector of the y values of the original supporting point data points.
(24) In the formula, k.sub.*.sup.T is again the covariant vector, but calculated this time between query point x and the M-dimensional vector of virtual supporting point data points
Q.sub.y*=Q.sub.M.sup.1K.sub.MN(.sub.n.sup.2I).sup.1Y
(25) The same form as for the prediction of conventional Gaussian processes is thus obtained:
(26)
if suitable values are used for parameter vector Q.sub.y* and the virtual supporting point data points.
(27)
(28) The essential step in the preparation of the sparse Gaussian process model in the form of the algorithm available on model calculation unit 43 is the calculation of vector Q.sub.y*. Multiple possibilities exist for this purpose; before they are described, however, some notation must firstly be introduced.
(29) TABLE-US-00001 variable meaning N number of the supporting point data points in the original Gaussian process model M number of the virtual supporting point data points
(30) In addition, the Cholesky method for solving equation systems having a positive defined square matrix is also used.
(31) For a positive defined square matrix K, a Cholesky decomposition L may always be calculated, so that L is an upper triangular matrix with the property
L.sup.TL=K.
(32) To solve the equation system K.Math.x=v, the expression K.sup.1v must be calculated. This is carried out with the aid of the Cholesky decomposition as follows:
K.sup.1v=(L.sup.TL).sup.1v=L.sup.1L.sup.Tv.
(33) In the formula, L.sup.T=(L.sup.1).sup.T denotes the transposed inverse. Since L is an upper triangular matrix, the expression may be calculated by a forward substitution and a reverse substitution.
(34) Expressions of the form v.sup.TK.sup.1v for a positive defined matrix K and a vector v may be represented with the aid of the Cholesky decomposition as follows:
v.sup.TK.sup.1v=v.sup.T(LL.sup.T).sup.1v=(L.sup.1v).sup.T(L.sup.1v)=L.sup.1v.sub.2.sup.2.
(35) In conjunction with Gaussian processes, K is typically a covariant matrix and therefore square and positively-semi-definite. For the positive-definite case, the above equations may thus be used. If the matrix is positive-semi-definite, a jitter (for example, a value of 10.sup.6) is thus typically added to the diagonal of matrix K, to obtain a positive-definite matrix.
(36) Two methods for determining vector Q.sub.y* will be explained hereafter.
(37) 1) Direct Method
(38) The direct conversion of Q.sub.y*=Q.sub.M.sup.1+K.sub.MN(.sub.n.sup.2I).sup.1Y is one possible procedure. If possible, the Cholesky decomposition is used to avoid direct calculations of inverse matrices. The calculation of Q.sub.y* is carried out according to the following steps, which will be explained in conjunction with
(39) In step S1, matrices K.sub.M, K.sub.N, and K.sub.MN are calculated.
(40) Subsequently, in step S2, =diag(K.sub.MN.sup.TK.sub.M.sup.1K.sub.MN) is determined using the Cholesky decomposition of K.sub.M (with a jitter).
(41) In step S3, (+.sub.n.sup.2I).sup.=1 is calculated, +.sub.n.sup.2I corresponding to a diagonal matrix which may simply be inverted element by element.
(42) In step S4, Q.sub.M is determined.
(43) In step S5, the Cholesky decomposition L.sub.M=chol(Q.sub.M) of Q.sub.M is calculated. In this case, as in step S2, a jitter is added to Q.sub.M. This corresponds to the procedure as if matrix K.sub.M were provided with a jitter and then used for calculating Q.sub.M.
(44) Q.sub.y*=L.sub.m.sup.TL.sub.m.sup.1+K.sub.MN(+.sub.n.sup.2I).sup.1Y then results, a forward or reverse substitution being necessary in each case. Y are the y values of the original training data, i.e., the same y values as are used for the normal training of the Gaussian process model. (The reduction of the dimension takes place with the multiplication of K.sub.MN from the left).
(45) 2) Matrix Factorization
(46) A second procedure includes the use of a matrix factorization.
(47) Firstly, new variables are introduced:
(48) L=chol(K.sub.M).sup.T
(49) V.sub.1=L.sup.1K.sub.MN
(50) V.sub.2=V.sub.1.sub.n+.sub.n.sup.2I.sup.1
(51) y.sub.2=n+.sub.n.sup.2I.sup.1Y
(52) L.sub.m=chol(.sub.n.sup.2I+V.sub.2V.sub.2.sup.T).sup.T
(53) l.sub.st=L.sup.1k.sub.*
(54) l.sub.mst=L.sub.m.sup.1l.sub.st=L.sub.m.sup.1L.sup.1k.sub.*
(55) =L.sub.m.sup.1(V.sub.2y.sub.2)
(56) Since is a diagonal matrix, {square root over (+.sub.n.sup.2I)} is the Cholesky decomposition of +.sub.n.sup.2I.
(57) Matrix Q.sub.M may be represented as
(58)
(59) Therefore, Q.sub.M.sup.1 results as
Q.sub.M.sup.1=L.sup.T(I+V.sub.1(+.sub.n.sup.2I).sup.1V.sub.1.sup.T).sup.1L.sup.1.Formula 1
(60) Under the consideration that +.sub.n.sup.2I is a diagonal matrix, it follows that
(61)
with formula 1, it results that
Q.sub.M.sup.1=.sub.n.sup.2L.sup.T(.sub.n.sup.2I+V.sub.2V.sub.2.sup.T).sup.1L.sup.1=.sub.n.sup.2L.sup.T(L.sub.mL.sub.m.sup.T).sup.1L.sup.1.Formula 2
(62) For further observation, the expression V.sub.2y.sub.2 must still be considered. In the rearrangement, the fact is again utilized that +.sub.n.sup.2I is a diagonal matrix:
(63)
(64) The model prediction then results as
y=k.sub.*.sup.TQ.sub.M.sup.1K.sub.MN(+.sub.n.sup.2I).sup.1Y
(65) By inserting formula 2, the following formula results
=k.sub.*.sup.TL.sup.T(L.sub.mL.sub.m.sup.T).sup.1L.sup.1.sub.n.sup.2K.sub.MN(+.sub.n.sup.2I).sup.1Y
(66) By inserting formula 3, the following formulas result
(67)
(68) In the model analysis, expression l.sub.mst may be determined. .sup.T is calculated beforehand off-line and stored. To determine l.sub.mst, two forward substitutions are to be calculated, which is relatively time-consuming and therefore not possible on model calculation unit 43.
(69) The only possibility for calculating this form of the model analysis using the process provided on model calculation unit 43 is according to Formula 4. With the proviso
Q.sub.y*=L.sup.TL.sub.m.sup.TL.sub.m.sup.1V.sub.2y.sub.2.
the model prediction may be carried out according to the formula
(70)
which is implemented on model calculation unit 43.