Fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression

11164095 · 2021-11-02

Assignee

Jiangnan University (Wuxi, CN)

Inventors

Cpc classification

International classification

Abstract

The invention provides a fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression, it is suitable for application in chemical process with time delay characteristics. This method can extract stable delay information from the historical database of process and introduce more relevant modeling data sequence to the dominant variable sequence. First of all, the method of fuzzy curve analysis (FCA) can intuitively judge the importance of the input sequence to the output sequence, estimate the time-delay parameters of process, and such offline time-delay parameter set can be utilized to restructure the modeling data. For the new input data, based on the historical variable value before a certain time, the current dominant value can be predicted by time difference Gaussian Process Regression (TDGPR) model. This method does not encounter the problem of model updating and can effectively track the drift between input and output data. Compared with steady-state modeling methods, this invention can achieve more accurate predictions of the key variable, thus improving product quality and reducing production costs.

Claims

1. A method comprising: collecting values of a plurality of first variables of a reaction vessel, as functions of time; collecting values of a second variable of the reaction vessel as a function time; generating a plurality of new expanded variables for each variable in the plurality of first variables by introducing time delay information based on the collected values of the plurality of first variables; determining a variable time delay value of the first variables by making centroid defuzzification of each new expanded variable using fuzzy curve analysis (FCA) based on the values of the second variable; establishing a Gaussian process model based on the values of the first variables delayed by their respective variable time delay values and the values of the second variable; determining a prediction of the second variable by a time difference Gaussian process regression (TDGPR) using the Gaussian process model; controlling the reaction vessel based on the prediction of the second variable; wherein the second variable is a bottom butane concentration in the reaction vessel.

2. The method of claim 1, wherein the reaction vessel is a debutanizer.

3. The method of claim 2, wherein the plurality of first variables are selected from the group consisting of a top temperature of the debutanizer, a top pressure of the debutanizer, a reflux flow of the debutanizer, a top product outflow of the debutanizer, a tray temperature of the debutanizer, a bottom temperature of the debutanizer, and combinations thereof.

Description

BRIEF DESCRIPTION OF DRAWINGS

(1) FIG. 1 is a flow chart of online soft sensor method based on FCA-TDGPR;

(2) FIG. 2 is a schematic diagram of debutanizer process;

(3) FIG. 3 is a schematic diagram of TDGPR modeling approach;

(4) FIG. 4 contains fuzzy curve distribution diagrams of the original variables and optimal time delay variables;

(5) FIG. 5 contains scatterplots of butane concentration predictions with different j values.

EXAMPLES

(6) The modeling flow chart, which is shown in FIG. 1 below, is further detailed in the present invention:

(7) Take the actual chemical process as an example, debutanizer is an important part of naphtha desulfurization and separation device of oil refining production process, and one of the dominant variables needed to be controlled for this process is the concentration of the bottom butane (C4). The schematic diagram of the process is shown in FIG. 2, due to the value of C4 cannot be directly measured, therefore, there is a delay issue in analyzing and obtaining C4 concentration values. At the same time, different auxiliary variables show different degrees of time delay. Experimental data is derived from the actual industrial process which contains 2394 samples, a total of 7 auxiliary variables. As shown in FIG. 2, x.sub.1 is the top temperature; x.sub.2 is the top pressure; x.sub.3 is the reflux flow; x.sub.4 is the top product outflow; x.sub.5 is the 6th tray temperature; x.sub.6 is the bottom temperature 1; x.sub.7 is the bottom temperature 2. The 1 dominant variable is the bottom butane concentration, which in this invention is predicted as the key variable of the process.

(8) Step1: Collect historical input and output data to form a training database which contains N continuous samples. Assuming that the data is expressed as {X(t),y(t)},t=1, 2, . . . , N and is preprocessed, and the 2 bottom temperature variables are averaged as 1 auxiliary variable, then, X(t)=[x.sub.1(t),x.sub.2(t),x.sub.3(t),x.sub.4(t),x.sub.5(t),x.sub.6(t)].sup.T. The maximum time delay T.sub.max of 6 variables is set to 19.

(9) Step2: For each of the original variables x.sub.i, i∈{1, 2, . . . , 6}, they are extended to the input variables with time delay {x.sub.i(t−λ), λ=0, 1, . . . , T.sub.max} by formula (1), and a set of 120 dimensional delay variables will be obtained for subsequent analysis

(10) $\begin{matrix} \begin{matrix} \underset{︸}{x_{i} (t), x_{i} (t - 1), .Math. x_{i} (t - d_{i}), .Math., x_{i} (t - T_{m a x})} \\ \begin{matrix} i = 1, 2, .Math., m & m sets \end{matrix} \end{matrix} & (1) \end{matrix}$

(11) Step3: Determine the importance of each variable in the time delay input variable set by FCA, for (x.sub.i(t),y(t)), fuzzy membership function of variable x.sub.i is defined as:

(12) $\begin{matrix} Φ_{it} (x_{i}) = \exp [- {(\frac{x_{i} (t) - x_{i}}{b})}^{2}] & (2) \end{matrix}$

(13) For each x.sub.i, {Φ.sub.it, y(t)} provides a fuzzy rule which is described as {if x.sub.i is Φ.sub.it(x),then y is y(t)},and Φ.sub.it is a fuzzy membership function of input variable x.sub.i at t-th data point; In formula (2) a Gaussian fuzzy membership function is selected; b is determined as 20% the range of variable x.sub.i. As a result, For N training samples, each sample corresponding to each variable has N fuzzy rules. In the fuzzy membership function, Φ.sub.it=1 holds true at each point {x.sub.i(t), y(t)}.

(14) For time delay process, by introducing time delay information the original variable x.sub.i becomes (T.sub.max+1)-dimensional, which can be expressed as x.sub.i(t−λ), λ=0, 1, . . . , T.sub.max, λ is a variable delay value to be introduced; fuzzy curve C.sub.i,λ with the condition that λ is the i-th variable delay value can be obtained by making centroid defuzzification of each new expanded variable using formula (3); as shown in the formula (4), d.sub.i is the λ which can make the maximum coverage of fuzzy curve C.sub.i,λ; C.sub.i,λ(λ).sub.max is the maximum value of the fuzzy curve point range, while C.sub.i,λ(λ).sub.min is the minimum value of the fuzzy curve point range;

(15) $\begin{matrix} C_{i, λ} (λ) = \frac{{.Math.}_{t = 1}^{N} Φ_{it} [x_{i} (t - λ)] .Math. y (t)}{{.Math.}_{t = 1}^{N} Φ_{it} [x_{i} (t - λ)]} & (3) \\ d_{i} = \underset{λ}{argmax} [{C_{i, λ} (λ)}_{ma x} - {C_{i, λ} (λ)}_{m i n}] & (4) \end{matrix}$

(16) If the scope of the C.sub.i,λ(λ) range is closer to that of y, then the input variable x.sub.i(t−λ) is more important. In view of this point, the importance degree of each variable can be determined by sorting the coverage of C.sub.i,λ(λ). Finally, the optimal delay parameter d.sub.i as well as time delay variable x.sub.i(t−d.sub.i) can thus be obtained by FCA method, which later on can be used for soft sensor modeling data reconstruction.

(17) Step4: Based on the previous step, the time delay parameters d.sub.1, d.sub.2, . . . , d.sub.m are used to reconstruct the training input sample set for on-line modeling, the reconstructed input dataset is denoted as X.sub.d(t), X.sub.d(t)=[x.sub.1(t−d.sub.1),x.sub.2(t−d.sub.2),x.sub.3(t−d.sub.3),x.sub.4(t−d.sub.4),x.sub.5(t−d.sub.5),x.sub.6(t−d.sub.6)]. If there is a new input sample X(t+1), then the delay input set could be restructured based on historical database samples with the same parameters, then go to step 5, otherwise, wait for the arrival of new data.

(18) Step 5: After the reorganization procedure, the training set and the new data are processed by j order time difference treatment (the value of j can be determined according to the sampling period and property of dominant variable):
ΔX.sub.d,j(t)=X.sub.d(t)−X.sub.d(t−j)
Δy.sub.j(t)=y(t)−y(t−j) (5)

(19) Next, make a regression of the relationship between ΔX.sub.d,j(t) and Δy.sub.j(t) by GPR, which satisfies Δ(t)=f(ΔX(t))+e(t). The GPR method can obtain the mapping relationship through the given training input and output samples. In this way, the corresponding predictive value and the uncertainty degree can be obtained given the new input data, which means the result will be probabilistic. The GPR algorithm is shown as below.

(20) In general, the relationship between the observed output value y and noise ε satisfies:
y.sub.i=f(x.sub.i)+ε
ε˜N(0,σ.sub.n.sup.2) (6)

(21) If the mean function and covariance function are determined, then the distribution of the Gaussian process is well-determined. For simplicity, the mean function is usually preprocessed into 0. Covariance function can transform the correlation of output data into the function of input data. As similar inputs produce similar outputs, the covariance function can be selected according to the characteristics of the sample distribution. One condition which must be satisfied is that the closer the distance of samples is, the more correlated the two samples are, and vise versa. The covariance function form of this invention is shown in formula (7):

(22) $\begin{matrix} k (x_{p}, x_{q}) = v \exp [- \frac{1}{2} \overset{D}{\underset{d = 1}{.Math.}} {π_{d} (x_{p}^{d} - x_{q}^{d})}^{2}] & (7) \end{matrix}$

(23) In the formula, x.sub.p, x.sub.q∈R.sup.D, v controls the magnitude of the covariance function. π.sup.d describes the relative importance of each input attribute x.sup.d. The determination of the hyper-parameter Θ.sub.gp=(v, π.sub.1, . . . , π.sub.D, σ.sub.n.sup.2) in the Gaussian process is generally estimated by the MLE method. The optimization of the parameters can be realized by using the conjugate gradient method. Based on test sample and training data, the posterior distribution of test data x* can be calculated, and its predictive value obey the joint Gaussian distribution described in formula (9), where K(X,X) is n-dimensional covariance matrix of training samples; k(x*,X) is the covariance vector of test sample and training samples; k(x*,x*) is the autocovariance of test sample, and f.sub.gp is a predictive value of GPR.

(24) $\begin{matrix} L (Θ_{gp}) = - \frac{1}{2} {y^{T} [K (X, X) + σ_{n}^{2} I]}^{- 1} y - \frac{1}{2} \log \det [K (X, X) + σ_{n}^{2} I] - \frac{n}{2} \log 2 π & (8) \\ f_{gp} | X, y, x_{*} ~ N (\overline{f_{gp}}, cov (f_{gp})) s . t . \overline{f_{gp}} = {k (x_{*}, X) [K (X, X) + σ_{n}^{2} I_{n}]}^{- 1} y cov (f_{gp}) = k (x_{*}, x_{*}) - {{k (X, x_{*})}^{T} [K (X, X) + σ_{n}^{2} I_{n}]}^{- 1} .Math. k (X, x_{*}) & (9) \end{matrix}$

(25) when the new input data arrives at time t+1, the calculating formula of predictive value y.sub.j,pred(t+1) with TDGPR method is:
ΔX.sub.j(t+1)=X(t+1)−X(t+1−j)
Δy.sub.j,pred(t+1)=f.sub.GPR(ΔX.sub.j(t+1))
y.sub.j,pred(t+1)=y.sub.j(t+1−j)+Δy.sub.j,pred(t+1) (10)

(26) In the actual industrial process, there will be the case of instrument damage or laboratory analysis with delay, and the circumstance that the time interval of obtaining dominant variable is large and the quantity is small or there is a lack of dominant analysis value in the database. Thus, shown in FIG. 3, for new incoming test data X(t+1), based on y.sub.j(t+1−j) stored in the database j moment ago, the predictive value of the dominant variable at time t+1 can be obtained. The predicted output y.sub.j,pred(t+1) of the online model is calculated by formula (10), and the predicted result of bottom butane concentration can be obtained.

(27) As shown in FIG. 4, compared the original variables without time delay, 6 reconstructed variables contribute more to the dominant variable, which introduce more relevant modeling data for online modeling. At the same time, in order to verify the effectiveness of this invention for on-line estimation, the first 1519 samples are selected in 2394 samples to reconstruct 1500 training samples. The final 875 samples are then used as test samples, and a soft sensor is established for on-line prediction of butane concentration.

(28) FIG. 5 contains scatterplots of butane concentration prediction results with two methods respectively denoted as the FCA-TDGPR method (present method) which involves delay estimation, and t-TDGPR method which is without time delay estimation.

(29) From FIG. 5, when the time difference order j increases from 1 to 10, the time interval of prediction based on historical database is gradually increasing, and the prediction accuracy is declining. This is because the more recent the analysis value is, the better tracking ability the model has for current process dynamics.

(30) Although the accuracy of the two methods are in decline, compared with the TDGPR method without considering the time delay, the predicted results of the present invention can be better close to the true value of the butane concentration when the time difference increases. This suggests that extracted delay information is in line with the actual causal relationship of the process, and the soft sensor model with variable time delay estimation is more accurate.

(31) After fuzzy curve analysis method is taken to determine the optimal parameters, reconstructed data is proved to be capable of enhancing the accuracy of online model significantly by introducing more contributing auxiliary variables to dominant variable sequence. At the same time, it reflects that the GPR method can explain the dynamic change of the process well. The online soft sensor model based on TDGPR method can adaptively estimate real-time butane concentration with historical variables collected j time ago.

(32) FIGS. 4 and 5 have jointly validated that fuzzy curve analysis based time difference Gaussian process regression soft sensor modeling method has good accuracy for on-line prediction of bottom butane concentration.

(33) While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention. All figures, tables, appendices, patents, patent applications and publications, referred to above, are hereby incorporated by reference.

Fuzzy curve analysis based soft sensor modeling method using time difference Gaussian process regression

Assignee

Inventors

Cpc classification

Classification Explorer

G06N5/048

PHYSICS

Classification Explorer

F23N2223/52

MECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING

International classification

Classification Explorer

G06N5/04

PHYSICS

Abstract

Claims

Description