A Visualization Method for Process Monitoring Based on Bi-kernel T-distributed Stochastic Neighbor Embedding
20220317672 · 2022-10-06
Inventors
- Pu Wang (BEIJING, CN)
- Haili Zhang (BEIJING, CN)
- Xuejin Gao (BEIJING, CN)
- Huihui Gao (BEIJING, CN)
- Huayun Han (BEIJING, CN)
Cpc classification
Y02P90/02
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G05B23/024
PHYSICS
International classification
Abstract
The invention discloses a visualization method for process monitoring based on bi-kernel t-distributed stochastic neighbor embedding. It includes two steps of offline modeling and online monitoring. In offline modeling, standard t-SNE method is used to reduce the dimension of historical normal data. The mapping parameter matrix from the input kernel matrix to the feature kernel matrix is calculated. PCA is used to reduce the feature kernel matrix to two dimensions, and then the square Mahalanobis distance is calculated as a statistic and the control limit is solved. Online monitor and calculate the kernel function to between the collected data and the modeling data; and the obtained kernel vector is multiplied by the mapping parameter matrix to obtain the mapped feature kernel vector. PCA is used to reduce the dimension of the mapped feature kernel vector to obtain two-dimensional features for visualization. Draw the scatter diagram of the feature and observe whether it is within the ellipse control limit. Compared with the prior art, the present invention retains the advantage of data dimension reduction of the standard t-SNE method, and meanwhile applies it to the visualization of industrial process fault monitoring, reducing the rate of misreport and underreport of industrial process monitoring.
Claims
1. An monitoring visualization method for the process of bi-kernel t-distributed stochastic neighbor embedding, which is characterized in that: t-SNE method is used to reduce dimension of high-dimensional data for industrial process and bi-kernel mapping is used to realize online extension of out-of-sample mapping, and a mapped kernel matrix is reduced to two dimension by principal component analysis (PCA); two-dimensional features and oval control limit are drawn directly in a two-dimensional rectangular coordinate system, providing simple and intuitive fault monitoring visualization way, and improving monitoring performance; the specific steps are as follows: A. offline modeling stage: historical data X(x.sub.1, x.sub.2. . . , x.sub.n) are obtained and standardized, where n is the number of variables, and the standardized calculation formula is as follows:
W=(K.sub.x.sup.T.Math.K.sub.x).sup.−1.Math.K.sub.x.sup.T.Math.K.sub.y (4) 5) the matrix K.sub.y is transformed into the final required two-dimensional feature Y by PCA;
Y=K.sub.y.Math.P (5) where P is load matrix; 6) design statistics and control limits: the square Mahalanobis distance is introduced as a statistic, and δ, the 95% confidence limit of the square Mahalanobis distance, is calculated as the fault monitoring control limit using the kernel density estimation; the statistical calculation formula is as follows:
T.sub.i.sup.2=(y.sub.i−
(y−
Description
DESCRIPTION OF DRAWINGS
[0025]
[0026]
[0027]
[0028]
EXEMPLARY EMBODIMENT
[0029] Tennessee Eastman Process (TE) is a simulation of actual chemical industry process proposed by J. J. Downs and E. F. Vogel from Tennessee Eastman Chemical Company, USA. It is widely used in the research of process control technology. There are four kinds of main materials involved in the reaction in TE process, namely A, C, D and E, which are all gaseous materials. Two kinds of products G and H, as well as a by-product F, are produced. In addition, a small amount of inert gas B is also included in the product feed. A total of 52 variables were collected during the process with a sampling interval of 3 minutes. It lasts for 25 hours to train normal data set and it lasts for 48 hours to test data set. The fault data tested are normal in the first 8 hours, and the fault is introduced in the 9th hour.
[0030] The training data and test data include 1 set of normal data and 21 sets of fault data. The specific fault locations and related descriptions are shown in Table 1.
TABLE-US-00001 TABLE 1 21 faults in the TE process Faults Description Type IDV(1) Feed flow ratio of A/C changes, content of B Phase Step does not change IDV(2) Content of B changes, feed flow ratio of A/C Phase Step does not change IDV(3) Temperature of material D changes Phase Step IDV(4) Temperature of reactor cooling water inlet Phase Step changes IDV(5) Temperature of condenser cooling water inlet Phase Step changes IDV(6) Material A losses Phase Step IDV(7) Pressure head of material C losses Phase Step IDV(8) Composition of material A, B and C changes Random IDV(9) Temperature of material D changes Random IDV(10) Temperature of material C changes Random IDV(11) Temperature of reactor cooling water inlet Random changes IDV(12) Temperature of condenser cooling water inlet Random changes IDV(13) Dynamics constants of reactor Change Slow drift IDV(14) Reactor cooling water valve Valve sticks IDV(15) Condenser cooling water valve Valve sticks IDV(16) Unknown Unknown perturbance IDV(17) Unknown Unknown perturbance IDV(18) Unknown Unknown perturbance IDV(19) Unknown Unknown perturbance IDV(20) Unknown Unknown perturbance IDV(21) Valve for flow 4 is fixed in steady state Constant position position
[0031] Based on the above contents, the technical scheme described in the invention is applied to the TE process simulation data mentioned above, and the specific implementation steps are as follows:
A. Offline Modeling Stage
[0032] 1) Obtain normal historical data X as training data, and standardize each variable to obtain X′;
[0033] 2) Calculate the low-dimensional feature Y.sub.tSNE of X′ by standard t-SNE;
[0034] 3) Calculate the kernel matrices K.sub.x and K.sub.y of X′ and Y.sub.tSNE respectively according to equations (2) and (3). In this experiment, the kernel parameter preferences are σ.sub.x=2, σ.sub.y=6;
[0035] 4) Calculate the mapping parameter matrix W between kernel matrices by equation (4);
[0036] 5) The matrix K.sub.y is transformed into the final required two-dimensional feature Y by PCA;
[0037] 6) Calculate the square Mahalanobis distance as a statistic, and δ, the 95% confidence limit of the square Mahalanobis distance, is calculated as the fault monitoring control limit using the kernel density estimation;
[0038] 7) Draw the scatter diagram and the ellipse control limit of two-dimensional features.
B. Online Monitoring Stage
[0039] 1) Collect the data of all variables at the current time i to obtain x.sub.new,i, and standardize it according to the mean value and variance of each variable obtained offline to obtain x′.sub.new,k;
[0040] 2) Calculate the kernel function of x′.sub.new,k and all normal training data X to obtain k.sub.x,i;
[0041] 3) The kernel function value k.sub.y,i=W.Math.K.sub.x,I of the feature obtained by bi-kernel mapping;
[0042] 4) Reduce k.sub.y,i to two dimension by PCA: y.sub.i=k.sub.y,i.Math.P;
[0043] 5) The feature y, is traced to a point in the scatter diagram to realize fault monitoring visualization, so as to observe whether the point exceeds the range of the ellipse control limit to judge whether there is a fault or not. In addition, the value of statistics can be calculated by equation (5) and compared with the control limit δ to judge whether there is a fault or not from the perspective of quantification.
[0044] To verify the accuracy and effectiveness of fault monitoring in the proposed method, faults 1, 4 and 14 in TE process were tested respectively, and compared with PCA, LPP and NPE methods. The two-dimensional features are all retained in three comparison methods, and the square Mahalanobis distances used as a statistic to draw a scatter diagram for visualization. The visualization results for faults 1, 4, and 14 are shown in
[0045] The black hollow triangle represents the normal training features, the black solid circle represents the normal test data, the gray solid circle represents the test fault data, and the elliptical dotted line represents the control limit. Each test fault contains 800 fault samples, and different gray gradients indicate the sequence of fault samples, so that the visualization diagram can show the distribution of fault features over time variation.
[0046] Fault 1 is the phase step change of feed flow ratio of A/C. At the beginning of the change, each variable fluctuates obviously, and after a period of time, the process control system stabilizes the process to a new state. It is obvious in the results of bi-kernel t-SNE method that the fault features deviate greatly in the initial stage and gradually stabilize in another region in the later stage. Although the features of PCA, LPP and NPE deviate at the initial stage of the fault, the features at the later stage basically coincide with the normal feature range, which do not reflect the difference from the normal state. For faults 4 and 14, most of the fault features extracted by PCA, LPP and NPE methods cover the normal range, and only a small part of the fault samples could be detected, while bi-kernel t-SNE could detect almost all the fault samples.
[0047] Bi-kernel t-SNE method has high fault detection rate, and its visualization effect is obviously superior to PCA, LPP and NPE methods. This is because the features extracted by t-SNE method contains more information than PCA, LPP and NPE methods, and bi-kernel mapping extends this advantage to the applications in online contexts.