Fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression
11164095 · 2021-11-02
Assignee
Inventors
Cpc classification
International classification
Abstract
The invention provides a fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression, it is suitable for application in chemical process with time delay characteristics. This method can extract stable delay information from the historical database of process and introduce more relevant modeling data sequence to the dominant variable sequence. First of all, the method of fuzzy curve analysis (FCA) can intuitively judge the importance of the input sequence to the output sequence, estimate the time-delay parameters of process, and such offline time-delay parameter set can be utilized to restructure the modeling data. For the new input data, based on the historical variable value before a certain time, the current dominant value can be predicted by time difference Gaussian Process Regression (TDGPR) model. This method does not encounter the problem of model updating and can effectively track the drift between input and output data. Compared with steady-state modeling methods, this invention can achieve more accurate predictions of the key variable, thus improving product quality and reducing production costs.
Claims
1. A method comprising: collecting values of a plurality of first variables of a reaction vessel, as functions of time; collecting values of a second variable of the reaction vessel as a function time; generating a plurality of new expanded variables for each variable in the plurality of first variables by introducing time delay information based on the collected values of the plurality of first variables; determining a variable time delay value of the first variables by making centroid defuzzification of each new expanded variable using fuzzy curve analysis (FCA) based on the values of the second variable; establishing a Gaussian process model based on the values of the first variables delayed by their respective variable time delay values and the values of the second variable; determining a prediction of the second variable by a time difference Gaussian process regression (TDGPR) using the Gaussian process model; controlling the reaction vessel based on the prediction of the second variable; wherein the second variable is a bottom butane concentration in the reaction vessel.
2. The method of claim 1, wherein the reaction vessel is a debutanizer.
3. The method of claim 2, wherein the plurality of first variables are selected from the group consisting of a top temperature of the debutanizer, a top pressure of the debutanizer, a reflux flow of the debutanizer, a top product outflow of the debutanizer, a tray temperature of the debutanizer, a bottom temperature of the debutanizer, and combinations thereof.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
EXAMPLES
(6) The modeling flow chart, which is shown in
(7) Take the actual chemical process as an example, debutanizer is an important part of naphtha desulfurization and separation device of oil refining production process, and one of the dominant variables needed to be controlled for this process is the concentration of the bottom butane (C4). The schematic diagram of the process is shown in
(8) Step1: Collect historical input and output data to form a training database which contains N continuous samples. Assuming that the data is expressed as {X(t),y(t)},t=1, 2, . . . , N and is preprocessed, and the 2 bottom temperature variables are averaged as 1 auxiliary variable, then, X(t)=[x.sub.1(t),x.sub.2(t),x.sub.3(t),x.sub.4(t),x.sub.5(t),x.sub.6(t)].sup.T. The maximum time delay T.sub.max of 6 variables is set to 19.
(9) Step2: For each of the original variables x.sub.i, i∈{1, 2, . . . , 6}, they are extended to the input variables with time delay {x.sub.i(t−λ), λ=0, 1, . . . , T.sub.max} by formula (1), and a set of 120 dimensional delay variables will be obtained for subsequent analysis
(10)
(11) Step3: Determine the importance of each variable in the time delay input variable set by FCA, for (x.sub.i(t),y(t)), fuzzy membership function of variable x.sub.i is defined as:
(12)
(13) For each x.sub.i, {Φ.sub.it, y(t)} provides a fuzzy rule which is described as {if x.sub.i is Φ.sub.it(x),then y is y(t)},and Φ.sub.it is a fuzzy membership function of input variable x.sub.i at t-th data point; In formula (2) a Gaussian fuzzy membership function is selected; b is determined as 20% the range of variable x.sub.i. As a result, For N training samples, each sample corresponding to each variable has N fuzzy rules. In the fuzzy membership function, Φ.sub.it=1 holds true at each point {x.sub.i(t), y(t)}.
(14) For time delay process, by introducing time delay information the original variable x.sub.i becomes (T.sub.max+1)-dimensional, which can be expressed as x.sub.i(t−λ), λ=0, 1, . . . , T.sub.max, λ is a variable delay value to be introduced; fuzzy curve C.sub.i,λ with the condition that λ is the i-th variable delay value can be obtained by making centroid defuzzification of each new expanded variable using formula (3); as shown in the formula (4), d.sub.i is the λ which can make the maximum coverage of fuzzy curve C.sub.i,λ; C.sub.i,λ(λ).sub.max is the maximum value of the fuzzy curve point range, while C.sub.i,λ(λ).sub.min is the minimum value of the fuzzy curve point range;
(15)
(16) If the scope of the C.sub.i,λ(λ) range is closer to that of y, then the input variable x.sub.i(t−λ) is more important. In view of this point, the importance degree of each variable can be determined by sorting the coverage of C.sub.i,λ(λ). Finally, the optimal delay parameter d.sub.i as well as time delay variable x.sub.i(t−d.sub.i) can thus be obtained by FCA method, which later on can be used for soft sensor modeling data reconstruction.
(17) Step4: Based on the previous step, the time delay parameters d.sub.1, d.sub.2, . . . , d.sub.m are used to reconstruct the training input sample set for on-line modeling, the reconstructed input dataset is denoted as X.sub.d(t), X.sub.d(t)=[x.sub.1(t−d.sub.1),x.sub.2(t−d.sub.2),x.sub.3(t−d.sub.3),x.sub.4(t−d.sub.4),x.sub.5(t−d.sub.5),x.sub.6(t−d.sub.6)]. If there is a new input sample X(t+1), then the delay input set could be restructured based on historical database samples with the same parameters, then go to step 5, otherwise, wait for the arrival of new data.
(18) Step 5: After the reorganization procedure, the training set and the new data are processed by j order time difference treatment (the value of j can be determined according to the sampling period and property of dominant variable):
ΔX.sub.d,j(t)=X.sub.d(t)−X.sub.d(t−j)
Δy.sub.j(t)=y(t)−y(t−j) (5)
(19) Next, make a regression of the relationship between ΔX.sub.d,j(t) and Δy.sub.j(t) by GPR, which satisfies Δ(t)=f(ΔX(t))+e(t). The GPR method can obtain the mapping relationship through the given training input and output samples. In this way, the corresponding predictive value and the uncertainty degree can be obtained given the new input data, which means the result will be probabilistic. The GPR algorithm is shown as below.
(20) In general, the relationship between the observed output value y and noise ε satisfies:
y.sub.i=f(x.sub.i)+ε
ε˜N(0,σ.sub.n.sup.2) (6)
(21) If the mean function and covariance function are determined, then the distribution of the Gaussian process is well-determined. For simplicity, the mean function is usually preprocessed into 0. Covariance function can transform the correlation of output data into the function of input data. As similar inputs produce similar outputs, the covariance function can be selected according to the characteristics of the sample distribution. One condition which must be satisfied is that the closer the distance of samples is, the more correlated the two samples are, and vise versa. The covariance function form of this invention is shown in formula (7):
(22)
(23) In the formula, x.sub.p, x.sub.q∈R.sup.D, v controls the magnitude of the covariance function. π.sup.d describes the relative importance of each input attribute x.sup.d. The determination of the hyper-parameter Θ.sub.gp=(v, π.sub.1, . . . , π.sub.D, σ.sub.n.sup.2) in the Gaussian process is generally estimated by the MLE method. The optimization of the parameters can be realized by using the conjugate gradient method. Based on test sample and training data, the posterior distribution of test data x* can be calculated, and its predictive value obey the joint Gaussian distribution described in formula (9), where K(X,X) is n-dimensional covariance matrix of training samples; k(x*,X) is the covariance vector of test sample and training samples; k(x*,x*) is the autocovariance of test sample, and f.sub.gp is a predictive value of GPR.
(24)
(25) when the new input data arrives at time t+1, the calculating formula of predictive value y.sub.j,pred(t+1) with TDGPR method is:
ΔX.sub.j(t+1)=X(t+1)−X(t+1−j)
Δy.sub.j,pred(t+1)=f.sub.GPR(ΔX.sub.j(t+1))
y.sub.j,pred(t+1)=y.sub.j(t+1−j)+Δy.sub.j,pred(t+1) (10)
(26) In the actual industrial process, there will be the case of instrument damage or laboratory analysis with delay, and the circumstance that the time interval of obtaining dominant variable is large and the quantity is small or there is a lack of dominant analysis value in the database. Thus, shown in
(27) As shown in
(28)
(29) From
(30) Although the accuracy of the two methods are in decline, compared with the TDGPR method without considering the time delay, the predicted results of the present invention can be better close to the true value of the butane concentration when the time difference increases. This suggests that extracted delay information is in line with the actual causal relationship of the process, and the soft sensor model with variable time delay estimation is more accurate.
(31) After fuzzy curve analysis method is taken to determine the optimal parameters, reconstructed data is proved to be capable of enhancing the accuracy of online model significantly by introducing more contributing auxiliary variables to dominant variable sequence. At the same time, it reflects that the GPR method can explain the dynamic change of the process well. The online soft sensor model based on TDGPR method can adaptively estimate real-time butane concentration with historical variables collected j time ago.
(32)
(33) While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention. All figures, tables, appendices, patents, patent applications and publications, referred to above, are hereby incorporated by reference.