METHOD FOR PREDICTING DIOXIN EMISSION CONCENTRATION

Abstract

A method for predicting dioxin (DXN) emission concentration based on hybrid integration of random forest (RF) and gradient boosting decision tree (GBDT). A random sampling of a training sample and an input feature is performed on a modeling data with a small sample size and a high-dimensional characteristic to generate a training subset. J RF-based DXN sub-models based on the training subset are established. J×I GBDT-based DXN sub-models are established by performing I iterations on each of the RF-based DXN sub-models. Predicted outputs of the RF-based DXN sub-model and the GBDT-based DXN sub-model are combined by a simple average weighting method to obtain a final output.

Claims

1. A method for predicting dioxin (DXN) emission concentration, comprising: (S1) performing, by a training sample and input feature random sampling module, a random sampling with replacement on a training sample set {X.Math.R.sup.N×M, y.Math.R.sup.N×1}N times and a random selection of a fixed number of input features from the training sample set to generate a training subset wherein X={x.sub.n}.sub.n=1.sup.N.Math.R.sup.N×M, which represents an input data {x|x.sub.1, . . . , x.sub.m, . . . x.sub.M} consisting of a process variable of a municipal solid waste incineration (MSWI) process acquired by a process control system while collecting a DXN test sample; the process variable comprises furnace temperature, activated carbon injection amount, stack emission gas concentration, grate speed, primary air flow and secondary air flow; N is the number of training samples; M is the number of the process variable; and y={y.sub.n}.sub.n=1.sup.N.Math.R.sup.N×1, which represents an output data consisting of the DXN emission concentration at an end of the MSWI process, wherein the end of the MSWI process is a stack emission end, and the DXN emission concentration is obtained by online collection and offline analysis; (S2) establishing, by a random forest (RF)-based DXN sub-model establishing module, a RF-based DXN sub-model {f.sub.RF.sup.j(⋅)}.sub.j=1.sup.J by utilizing the training subset {X.sup.j, y.sup.j}.sub.j=1.sup.J; and subtracting a predicted value {ŷ.sup.j}.sub.j=1.sup.J of the DXN emission concentration from a measured value {y.sup.j}.sub.j=1.sup.J of the DXN emission concentration to obtain a prediction error {e.sup.j,0}.sub.j=1.sup.J; (S3) performing, by a gradient boosting decision tree (GBDT)-based DXN sub-model establishing module, iteration I times on each of a new training subset {X.sup.j, e.sup.j,0}.sub.j=1.sup.J to build I×J GBDT-based DXN sub-models {{f.sub.GBDT.sup.j,i(⋅)}.sub.j=1.sup.J}.sub.j=1.sup.J; wherein the new training subset is formed by the prediction error {e.sup.j,0}.sub.j=1.sup.J as an output data true value and an input data of a training subset {X.sup.j}.sub.j=1.sup.J; (S4) subjecting, by a simple average-based DXN integrated prediction module, the RF-based DXN sub-model {ŷ.sub.RF.sup.j}.sub.j=1.sup.J and the GBDT-based sub-model {{f.sub.GBDT.sup.j,i(⋅)}.sub.j=1.sup.J}.sub.j=1.sup.J to simple averaging to establish a DXN emission concentration prediction model f.sub.DXN(⋅); and (S5) taking the input data {x|x.sub.1, . . . , x.sub.m, . . . x.sub.M} as an input of the DXN emission concentration prediction model; and calculating, successively by the RF-based DXN sub-model establishing module, the GBDT-based DXN sub-model establishing module and the simple average-based DXN integrated prediction module, a current DXN emission concentration value as a DXN emission concentration predicted value of the MSWI process.

2. The method of claim 1, wherein the training sample and input feature random sampling module is operated through steps of: processing data of the process variable of the MSWI process by a Bootstrap method and a random subspace method (RSM); extracting the training subset by the Bootstrap method, wherein the number of samples in the training subset is the same with the number of samples of the training sample set; and introducing the RSM to randomly select some features to generate J training subsets comprising N training samples and M.sup.j input features; expressed as follows: $\begin{matrix} \begin{matrix} X = {x_{n}}_{n = 1}^{N} \in R^{N \times M} \\ y = {y_{n}}_{n = 1}^{N} \\ J \end{matrix}} .Math. .Math. {\begin{matrix} {X^{1}, y^{1}} = {{(x^{1, M^{1}}, y^{1})}_{n}}_{n = 1}^{N} \\ .Math. \\ {X^{j}, y^{j}} = {{(x^{j, M^{j}}, y^{j})}_{n}}_{n = 1}^{N} \\ .Math. \\ {X^{J}, y^{J}} = {{(x^{J, M^{J}}, y^{J})}_{n}}_{n = 1}^{N} \end{matrix}; & (1) \end{matrix}$ wherein {X.sup.j, y.sup.j} is a jth training subset; (x.sup.j,M.sup.j, y.sup.j) is a nth input-output sample pair of the jth training subset; m=1,, . . . , M.sup.j, M.sup.j is the number of input features in the jth training subset; and M.sup.j<<M.

3. The method of claim 2, wherein the RF-based DXN sub-model establishing module is operated through the following steps with the j.sup.th training subset {(x.sup.j,M.sup.j, y.sup.j).sub.n}.sub.n=1.sup.N as an example: removing a duplicate sample from the jth training subset {(x.sup.j,M.sup.j, y.sup.j).sub.n}.sub.n=1.sup.N caused by random sampling; marking the duplicate sample as ${{(x^{j, M^{j}}, y^{j})}_{n_{sel}}}_{n_{sel} = 1}^{N_{sel}};$ splitting an input feature space into two areas, respectively R.sub.1 and R.sub.2 by taking a mth input feature x.sup.j,m as a splitting variable and a value $x_{n_{sel}}^{j, m}$ corresponding to a n.sub.selth sample as a splitting point: $\begin{matrix} {\begin{matrix} R_{1} (m, x_{n_{sel}}^{j, m}) = {x^{j, M^{j}} .Math. x^{j, m} \leq x_{n_{sel}}^{j, m}} \\ R_{2} (m, x_{n_{sel}}^{j, m}) = {x^{j, M^{j}} | x^{j, m} > x_{n_{sel}}^{j, m}} \end{matrix}; & (2) \end{matrix}$ finding a number of an optimal splitting variable and a value of the splitting point based on the following criterion by traversing all input features: $\begin{matrix} \min_{m, x_{n_{sel}}^{j, m}} {[\min_{C_{1}} \underset{x^{j, m} \in R_{1} (m, x_{n_{sel}}^{j, m})}{.Math.} (y_{1}^{j} - C_{1}))}^{2} + \min_{C_{2}} \underset{x^{j, m} \in R_{2} (m, x_{n_{sel}}^{j, m})}{.Math.} {(y_{2}^{j} - C_{2})}^{2}]; & (3) \end{matrix}$ wherein y.sub.1.sup.j and y.sub.2.sup.j are a measured value of DXN emission concentration of the jth training subset in the R.sub.1 and the R.sub.2, respectively; and C.sub.1 and C.sub.2 are a mean value of a measured value of DXN emission concentration in the R.sub.1 and the R.sub.2, respectively; repeating the above processes respectively for R.sub.1 and R.sub.2 until the number of training samples in a leaf node is less than a preset threshold θ.sub.RF to split the input feature space into K areas; and marking the K areas as R.sub.1, . . . , R.sub.k, . . . , R.sub.K, respectively; wherein K indicates the number of the leaf node of a classification and regression tree (CART); the RF-based DXN sub-model established by the CART is expressed as follows: $\begin{matrix} {\hat{y}}_{R F}^{j} = f_{R F}^{j} (.Math.) = {.Math.}_{k = 1}^{K} c_{R F}^{k} I (x^{j, M^{j}} \in R_{k}); & (4) \\ wherein c_{R F}^{k} = \frac{1}{N_{R_{k}}} {.Math.}_{n_{R_{k}} = 1}^{N_{R_{k}}} y_{n_{R_{k}}}^{j}, N_{R_{k}} \leq θ_{R F}, & (5); \end{matrix}$ wherein N.sub.R.sub.k is the number of training samples in the area R.sub.k; $y_{n_{R_{k}}}^{j}$ is a n.sub.R.sub.k th measured value of DXN emission concentration of the jth training subset in the area R.sub.k; I(⋅) is an indicator function; and when x.sup.j,M.sup.j.Math.R.sub.k, I(⋅)=1, otherwise I(⋅)=0; a prediction error of the RF-based DXN sub-model established based on the jth training subset {(x.sup.j,M.sup.j, y.sup.j).sub.n}.sub.n=1.sup.N is expressed as follows: $\begin{matrix} e^{j, 0} = y^{j} - {\hat{y}}_{R F}^{j} = {{(e^{j, 0})}_{n}}_{n = 1}^{N}; & (6) \end{matrix}$ wherein (e.sup.j,0).sub.n is a prediction error of DXN emission concentration based on a nth training sample; and repeating the above processes to obtain J RF-based DXN sub-models {f.sub.RF.sup.j(⋅)}.sub.j=1.sup.J established by the CART; and subtracting a predicted output {ŷ.sub.RF.sup.j}.sub.j=1.sup.J of the J RF-based DXN sub-models respectively from a measured value fy to obtain the prediction error {e.sup.j,0}.sub.j=1.sup.J.

4. The method of claim 3, wherein the GBDT-based DXN sub-model establishing module is operated through steps of: establishing multiple weak learner models “in series”; wherein an input data of a training subset of the multiple weak learner models is unchanged; a true value of output data of a training subset of a first GBDT-based DXN sub-model is an error between the predicted output of the RF-based DXN sub-model and the measured value; and a true value of output data of a training subset of other GBDT-based DXN sub-models is a prediction error of the GBDT-based DXN sub-model iterated in a previous iteration; taking establishment of a jth GBDT-based DXN sub-model as an example, and supposing that there are I GBDT-based DXN sub-models to be established by the CART: establishing a first GBDT-based DXN sub-model: $\begin{matrix} {\hat{y}}_{GBDT}^{j, 1} = {f_{GBDT}^{j, 1} ({{x^{j, M^{j}})}_{n}}}_{n = 1}^{N}, {{(e^{j, 0})}_{n}}_{n = 1}^{N})); & (7) \end{matrix}$ wherein ŷ.sub.GBDT.sup.j,1 is a predicted output of the first GBDT-based DXN sub-model; a loss function of the first GBDT-based DXN sub-model is defined as follows: $\begin{matrix} L_{G B D T} (y^{j}, {\hat{y}}_{G B D T}^{j, 1}) = \frac{1}{2} {.Math.}_{n = 1}^{N} {({(e^{j, 0})}_{n} - {({\hat{y}}_{G B D T}^{j, 1})}_{n})}^{2}; & (8) \end{matrix}$ wherein (ŷ.sub.GBDT.sup.j,1).sub.n is a predicted value of a nth sample in a jth training subset; calculating an output residual e.sup.j,1 of the first GBDT-based DXN sub-model f.sub.GBDT.sup.j,1(⋅) expressed as follows: $\begin{matrix} \begin{matrix} e^{j, 1} = e^{j, 0} - f_{G B D T}^{j, 1} (.Math.) \\ = y^{i} - f_{R F}^{j} (.Math.) - f_{G B D T}^{j, 1} (.Math.) \\ = y^{j} - {\hat{y}}_{R F}^{j} - {\hat{y}}_{G B D T}^{j, 1} \end{matrix}; & (9) \end{matrix}$ taking the e.sup.j,1 as a true value of output data of a training subset of a second GBDT-based DXN sub-model f.sub.GBDT.sup.j,2(⋅); wherein the second GBDT-based DXN sub-model is expressed as follows: $\begin{matrix} {\hat{y}}_{G B D T}^{j, 2} = f_{G B D T}^{j, 2} ({{(x^{j, M^{j}})}_{n}}_{n = 1}^{N}, {{(e^{j, 1})}_{n}}_{n = 1}^{N})); & (10) \end{matrix}$ wherein (e.sup.j,1).sub.n is a prediction error of the first GBDT-based DXN sub-model for the nth sample; repeating the above process; marking a ith (i≤I) GBDT-based DXN sub model as f.sub.GBDT.sup.j,i(⋅), wherein an output residual of the ith GBDT-based DXN sub-model is expressed as follows: $\begin{matrix} \begin{matrix} e^{j, i} = y^{j} - f_{R F}^{j} (.Math.) - f_{G B D T}^{j, 1} (.Math.) -, .Math., - f_{G B D T}^{j, i} (.Math.) \\ = y^{j} - {\hat{y}}_{R F}^{j} - {\hat{y}}_{G B D T}^{j, 1} -, .Math., - {\hat{y}}_{G B D T}^{j, i} \end{matrix}; & (11) \end{matrix}$ after I−1 iterations, a true value of output data of a training subset of a Ith GBDT-based DXN sub-model is expressed as follows: $\begin{matrix} e^{j, I - 1} = y^{j} - {\hat{y}}_{R F}^{j} - {\hat{y}}_{G B D T}^{j, 1} -, .Math., - {\hat{y}}_{G B D T}^{j, i}, .Math., - y_{G B D T}^{j, I - 1}; & (12) \end{matrix}$ wherein ŷ.sub.GBDT.sup.j,I−1 is a predicted output of a (I−1)th GBDT-based DXN sub-model; and the Ith GBDT-based DXN sub-model is expressed as follows: $\begin{matrix} {\hat{y}}_{G B D T}^{j, I} = f_{G B D T}^{j, I} ({{(x^{j, M^{j}})}_{n}}_{n = 1}^{N}, {{(e^{j, I - 1})}_{n}}_{n = 1}^{N})); & (13) \end{matrix}$ wherein (e.sup.j,I−1).sub.n is a prediction error of the (I−1)th GBDT-based DXN sub-model for the nth sample; and the I GBDT-based DXN sub-models constructed based on the jth training subset are expressed as {f.sub.GBDT.sup.j,i(⋅)}.sub.i=1.sup.I, and an output of the I GBDT-based DXN sub-models is expressed as {ê.sub.GBDT.sup.j,i}.sub.i=1.sup.I.

5. The method of claim 4, wherein the simple average-based DXN integrated prediction module is operated through steps of: indicating J RF-based DXN sub-models established in parallel as {f.sub.RF.sup.j(⋅)}.sub.j=1.sup.J; and indicating J×I GBDT-based DXN sub-models established in series and parallel simultaneously as ${{(f_{G B D T}^{j, i} (.Math.))}_{i = 1}^{I}}_{j = 1}^{J};$ establishing one RF-based DXN sub-model and I GBDT-based DXN sub-models in parallel for the jth training subset; and taking a sum of a predicted output of the one RF-based DXN sub-model and the I GBDT-based DXN sub-models as a total output of the jth training subset, expressed as follows: $\begin{matrix} \begin{matrix} {\hat{y}}^{j} = {\hat{y}}_{R F}^{j} + {\hat{y}}_{G B D T}^{j, 1} +, .Math., + {\hat{y}}_{G B D T}^{j, i}, .Math., + {\hat{y}}_{G B D T}^{j, I - 1} \\ = {\hat{y}}_{R F}^{j} + {.Math.}_{i = 1}^{I} {\hat{y}}_{G B D T}^{j, i} \\ = f_{R F}^{j} (.Math.) + {.Math.}_{i = 1}^{I} f_{G B D T}^{j, i} (.Math.) \end{matrix}; and & (14) \end{matrix}$ since the J training subsets are parallel, combining the one RF-based DXN sub-model with the I GBDT-based DXN sub-models through a simple average weighting method; wherein the prediction model f.sub.DXN(⋅)is expressed as follows: $\begin{matrix} \hat{y} = f_{DXN} (.Math.) = \frac{1}{J} {.Math.}_{j = 1}^{J} {\hat{y}}^{j} = \frac{1}{J} {.Math.}_{j = 1}^{J} (f_{R F}^{j} (.Math.) + {.Math.}_{i = 1}^{I} f_{G B D T}^{j, i} (.Math.)) . & (15) \end{matrix}$

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] FIG. 1 depicts a work process of municipal solid waste incineration;

[0017] FIG. 2 is a flow chart showing establishment of a model for predicting DXN emission concentration;

[0018] FIG. 3 depicts a prediction curve of training data; and

[0019] FIG. 4 depicts a prediction curve of test data.

DETAILED DESCRIPTION OF EMBODIMENTS

Description of MSWI Process for DXN Generation

[0020] MSW is transported by a vehicle to a weighbridge to be weighted and discharged into a garbage pool. After biologically fermented and dehydrated for 3-7 days, the MSW is transferred to a garbage hopper by a grab, fed to an incineration grate through a feeder, and subjected to drying, burning and incineration successively. Combustible components of the MSW after drying are burned in the combustion air delivered by a primary air fan. The ash residue generated by burning falls from an end of the grate to a slag conveyor to be transported to a slag pit, and finally is landfilled at a designated location. A temperature of the flue gas produced in the combustion process should be controlled above 850° C. in a first combustor to ensure a complete decomposition and combustion of harmful gas. When the flue gas passes through a second combustor, air delivered by a secondary air fan generates a turbulence, which ensures that the residence time of the flue gas exceeds 2 s, such that the harmful gas is further decomposed. The flue gas then enters a waste heat boiler and absorbs heat to generate high-temperature steam to drive a turbo-generator set to generate electricity. Subsequently, the flue gas is mixed with lime and activated carbon, and enters a deacidification reactor to undergo a neutralization reaction to allow the DXN and heavy metals therein to be adsorbed. Then, a flue gas particle, neutralization reactant and activated carbon are removed in a bag filter. Part of the gas and ash mixture is mixed with water in a mixer and then transported into the deacidification reactor for repeated treatment. Fly ash produced in the deacidification reactor and the bag filter enters a fly ash tank and is needed to be transported to experience further processing. The final gas is emitted to the atmosphere by an induced draft fan through a stack, which includes soot, CO, NOx, SO2, HC1, HF, Hg, Cd, DXN and so on.

[0021] As shown in FIG. 1, the MSWI is mainly used to convert the MSW into slag, fly ash, flue gas and heat, where the slag, fly ash and flue gas are related to the DXN emission. The furnace residue has a large generation amount but a relatively low DXN concentration, while the fly ash has a lower generation amount, but a larger DXN concentration. The DXN in flue gas is mainly derived from the incomplete combustion and the synthesis reaction. At present, a detection of DXN is mainly performed monthly or quarterly by enterprises and environmental protection departments off line, which is expensive and time consuming. Indicatively, the DXN emission concentration prediction modeling data has problems such as few true value sample and high dimensionality of process variable, as well as other objective problems such as unknown DXN content in MSW and complicated and unclear DXN generation and absorption mechanism. Therefore, establishing a DXN emission concentration prediction model based on a soft sensor method is necessary.

[0022] As shown in FIG. 2, a modeling strategy of the DXN emission concentration prediction model based on hybrid integration of random forest (RF) and gradient boosting decision tree (GBDT) (EnRFGBDT) includes a training sample and input feature random sampling module, a RF-based DXN sub-model establishing module, a GBDT-based DXN sub-model establishing module and a a simple average-based DXN integrated prediction module.

[0023] In FIG. 2, X={x.sub.n}.sub.n−1.sup.N.Math.R.sup.N×M, which represents an input data {x|x.sub.1, . . . , x.sub.m, . . . x.sub.M } consisting of a process variable (input feature) of a municipal solid waste incineration (MSWI) process acquired by a process control system while collecting a DXN test sample. The process variable includes furnace temperature, activated carbon injection amount, stack emission gas concentration, grate speed, primary air flow and secondary air flow. N is the number of training samples. M is the number of the process variable. and y={y.sub.n}.sub.n=1.sup.N.Math.R.sup.N×1, which represents an output data consisting of the DXN emission concentration at an end of the MSWI process, where the end of the MSWI process is a stack emission end, and the DXN emission concentration is obtained by online collection and offline analysis. {X, y} is a training sample set consisting of the input data and the output data. {X.sup.j, y.sup.j} is a training subset the jth obtained by random sampling from {X, y}, and {X.sup.j, y.sup.j}.sub.j=1.sup.J is all of the training subsets. J is the number of the training subset as well as the number of the RF-based DXN sub-models. ŷ.sub.RF.sup.j is a predicted value of the DXN emission concentration of a jth RF-based DXN sub-model f.sub.RF.sup.j(⋅). {ŷ.sub.RF.sup.j}.sub.j=1.sup.J is is a predicted output of all of the RF-based DXN sub-models. e.sup.j,0 is a predicted error between the predicted output Sr.sup.j.sub.RF of the DXN emission concentration of the jth RF-based DXN sub-model and a measured value y.sup.j of DXN emission concentration of the jth RF-based DXN sub-model. e.sup.j,1 is an output residual between a predicted output ŷ.sub.GBDT.sup.J,1 of a jth training subset of the first GBDT-based DXN sub-model and the e.sup.j,0 which is as a true value of output data of the jth training subset of the first GBDT-based DXN sub-model. e.sup.j,i is an output residual between a predicted output of a jth training subset of a ith GBDT-based DXN sub-model f.sub.GDBT.sup.j,i(⋅) and e.sup.j,i−1 which is as a true value of input data of the jth training subset of the ith GBDT-based DXN sub-model. {ŷ.sub.GBDT.sup.j,i}.sub.i=1.sup.I is a predicted output of a jth training subset of all of the GBDT-based DXN sub-models, where I is the number of the GBDT-based DXN sub-model of one training subset, also is the number of iterations of the training subset. ŷ is a predicted output of a DXN emission concentration prediction model based on hybrid integration of RF and GBDT.

[0024] All of sub-models of the DXN emission concentration prediction model based on hybrid integration of EnRFGBDT herein are established by maximize growth classification and regression trees (CART). The training subset of the RF-based DXN sub-model and the input feature of the RF-based DXN sub-model are generated by random sampling, where the number of features of the RF-based DXN sub-model is much smaller than that of an initial modeling data, therefore a correlation between the CART is reduced, and a robustness of an outlier and a noisy data are improved. Multiple GBDT-based DXN sub-models in series further improve a prediction precision of the CART. As a consequence, the DXN emission concentration prediction model in “parallel+series” is established. Different modules are performed as follows.

[0025] (1) A random sampling with replacement N time to the training sample set {X.Math.R.sup.N×M, y.Math.R.sup.N×1} and a random selection of a fixed number of input features from the training sample set are performed by the training sample and input feature random sampling module to generate the training subset {X.sup.j, y.sup.j}.sub.j−1.sup.J.

[0026] (2) A RF-based DXN sub-model {f.sub.RF.sup.j(⋅)}.sub.j=1.sup.J is established by the random forest (RF)-based DXN sub-model establishing module. A predicted value {ŷ.sup.j}.sub.j=1.sup.J of the DXN emission concentration is subtracted from a measured value {y.sup.j}.sub.j=1.sup.J of the DXN emission concentration to obtain a prediction error {e.sup.j,0}.sub.j=1.sup.J.

[0027] (3) I iterations are performed on each of a new training subset {X.sup.j, e.sup.j,0}.sub.j=1.sup.J to build I×J GBDT-based DXN sub-models {{f.sub.GBDT.sup.j,i(⋅)}.sub.i=1.sup.I}.sub.j=1.sup.J by the GBDT-based DXN sub-model establishing module, where the new training subset is formed by the prediction error {e.sup.j,0}.sub.j−1.sup.J as an output data true value and an input data of a training subset {X.sup.J}.sub.j−1.sup.J.

[0028] (4) The RF-based DXN sub-model {ŷ.sub.RF.sup.J}.sub.j−1.sup.J and the GBDT-based sub-model {{f.sub.GBDT.sup.j,i(⋅)}.sub.j−1.sup.I}.sub.j−1.sup.J are subjected to simple averaging by the simple average-based DXN integrated prediction module to establish the DXN emission concentration prediction model f.sub.DXN(⋅).

[0029] Accordingly, steps of modeling the method herein is as follows.

[0030] (1) A random sampling with replacement and a random selection of a fixed number of input features are performed on the process variable of the MSWI process to generate J training subsets.

[0031] (2) J RF-based DXN sub-models {f.sub.RF.sup.j(⋅)}.sub.j−1.sup.J are established.

[0032] (3) I iterations are performed to build I×J GBDT-based DXN sub-models {{f.sub.GBDT.sup.j,i(⋅)}.sub.j−1.sup.I}.sub.j−1.sup.J where the prediction error {e.sup.j,0}.sub.j−1.sup.J of the {f.sub.RF.sup.j(⋅)}.sub.j−1.sup.J is used as the output data true value.

[0033] (4) The RF-based DXN sub-model and the GBDT-based sub-model are subjected to simple averaging to establish the DXN emission concentration prediction model.

Description of Work Process Of Training Sample and Input Feature Random Sampling Module

[0034] The process variable of the MSWI process is processed by a Bootstrap method and a random subspace method (RSM).

[0035] The training subset is extracted by the Bootstrap method, where the number of samples in the training subset is the same with the number of samples of the training sample set.

[0036] Then the RSM is introduced to randomly select some features to generate J training subsets including N training samples and M.sup.j input features.

[0037] Generation of the training subset is expressed as follows:

[00001] $\begin{matrix} \begin{matrix} X = {x_{n}}_{n = 1}^{N} \in R^{N \times M} \\ y = {y_{n}}_{n = 1}^{N} \\ J \end{matrix}} .Math. .Math. {\begin{matrix} {X^{1}, y^{1}} = {{(x^{1, M^{1}}, y^{1})}_{n}}_{n = 1}^{N} \\ .Math. \\ {X^{j}, y^{j}} = {{(x^{j, M^{j}}, y^{j})}_{n}}_{n = 1}^{N} \\ .Math. \\ {X^{J}, y^{J}} = {{(x^{J, M^{J}}, y^{J})}_{n}}_{n = 1}^{N} \end{matrix} & (1) \end{matrix}$

[0038] where {X.sup.j,y.sup.j} is a jth training subset; (x.sup.j,M.sup.j, y.sup.j).sub.n is a nth input-output sample pair of the jth training subset; m=1, . . . , M.sup.j, M.sup.j is the number of input features in the jth training subset; and M.sup.j<<M.

Description of Work Process of Work Process of RF-Based DXN Sub-Model Establishing Module

[0039] The RF-based DXN sub-model establishing module is operated through the following steps with the jth training subset {(x.sup.j,M.sup.j, y.sup.j).sub.n}.sub.n=1.sup.N as an example.

[0040] A duplicate sample is removed from the jth training subset {(x.sup.j,M.sup.j, y.sup.j).sub.n}.sub.n=1.sup.N caused by random sampling the duplicate sample is marked as

[00002] ${{(x^{j, M^{j}}, y^{j})}_{n_{sel}}}_{n_{sel} = 1}^{N_{sel}} .$

A mth input feature x.sup.j,m is taken as a splitting variable, and a value x.sub.n.sub.sel.sup.j,m corresponding to the n.sub.selth sample is taken as a splitting point, such that an input feature space is split into two areas R.sub.1 and R.sub.2 respectively.

[00003] $\begin{matrix} {\begin{matrix} R_{1} (m, x_{n_{sel}}^{j, m}) = {x^{j, M^{j}} .Math. x^{j, m} \leq x_{n_{sel}}^{j, m}} \\ R_{2} (m, x_{n_{sel}}^{j, m}) = {x^{j, M^{j}} | x^{j, m} > x_{n_{sel}}^{j, m}} \end{matrix} & (2) \end{matrix}$

[0041] A number of an optimal splitting variable and a value of the splitting point are found based on the following criterion by traversing all input features:

[00004] $\begin{matrix} \min_{m, x_{n_{sel}}^{j, m}} {[\min_{C_{1}} \underset{x^{j, m} \in R_{1} (m, x_{n_{sel}}^{j, m})}{.Math.} (y_{1}^{j} - C_{1}))}^{2} + \min_{C_{2}} \underset{x^{j, m} \in R_{2} (m, x_{n_{sel}}^{j, m})}{.Math.} {(y_{2}^{j} - C_{2})}^{2}] & (3) \end{matrix}$

[0042] where y.sub.1.sup.j and y.sub.2.sup.j are a measured value of DXN emission concentration of the jth training subset in the R.sub.1 and the R.sub.2, respectively; and C.sub.1 and C.sub.2 are a mean value of a measured value of DXN emission concentration in the R.sub.1 and the R.sub.2, respectively.

[0043] The above processes are repeated respectively for R.sub.1 and R.sub.2 until the number of training samples in a leaf node is less than a preset threshold ∂.sub.RF to split the input feature space into K areas. The K areas are marked as R.sub.1, . . . , R.sub.k, . . . , R.sub.K, respectively, where K indicates the number of the leaf node of the CART.

[0044] The RF-based DXN sub-model established by the CART is expressed as follows

[00005] $\begin{matrix} {\hat{y}}_{RF}^{j} = f_{RF}^{j} (.Math.) = {.Math.}_{k = 1}^{K} c_{RF}^{k} I (x^{j, M^{j}} \in R_{k}) & (4) \\ c_{RF}^{k} = \frac{1}{N_{R_{k}}} {.Math.}_{n_{R_{k}} = 1}^{N_{R_{k}}} y_{n_{R_{k}}}^{j}, N_{R_{k}} \leq θ_{RF} & (5) \end{matrix}$

[0045] where N.sub.R.sub.k is the number of the training sample in the area R.sub.k;

[00006] $y_{n_{R_{k}}}^{j}$

is a n.sub.R.sub.kth measured value in the area R.sub.k of the jth training subset; I(⋅) is an indicator function; and when x.sup.j,M.sup.j.Math.R.sub.k, I(⋅)=1, otherwise I(⋅)=0 .

[0046] A prediction error of the RF-based DXN sub-model established based on the jth training subset {(x.sup.j,M.sup.j, y.sup.j).sub.n}.sub.n=1.sup.N is expressed as follows:

[00007] $\begin{matrix} e^{j, 0} = y^{j} - {\hat{y}}_{RF}^{j} = {{(e^{j, 0})}_{n}}_{n = 1}^{N} & (6) \end{matrix}$

[0047] where (e.sup.j,0).sub.n is a predicted error of DXN emission concentration based on a nth training sample.

[0048] The above processes are repeated to obtain J RF-based DXN sub-models {f.sub.RF.sup.j(⋅)}.sub.j=1.sup.J established by the CART. A predicted output {ŷ.sub.RF.sup.j}.sub.j=1.sup.J of the J RF-based DXN sub-models respectively is subtracted from a measured value {y.sup.j}.sub.j=1 to obtain the prediction error {e.sup.j,0}.sub.j=1.sup.J.

Description of Work Process of Work Process of the GBDT-Based DXN Sub-Model Establishing Module

[0049] Multiple weak learner models “in series” are established, where an input data of a training subset of the multiple weak learner models is unchanged. A true value of output data of a training subset of a first GBDT-based DXN sub-model is an error between the predicted output of the RF-based DXN sub-model and the measured value. And a true value of output data of a training subset of other GBDT-based DXN sub-models is a prediction error of the GBDT-based DXN sub-model iterated in a previous iteration.

[0050] Establishment of a jth GBDT-based DXN sub-model is taken as an example. I GBDT-based DXN sub-model are supposed to be established by the CART.

[0051] A first GBDT-based DXN sub-model is established:

[00008] $\begin{matrix} {\hat{y}}_{GBDT}^{j, 1} = f_{GBDT}^{j, 1} ({{(x^{j, M^{j}})}_{n}}_{n = 1}^{N}, {{(e^{j, 0})}_{n}}_{n = 1}^{N})) & (7) \end{matrix}$

[0052] where ŷ.sub.GBDT.sup.j,1 is the predicted output of the first GBDT-based DXN sub-model.

[0053] A loss function of the first GBDT-based DXN sub-model is defined as follows:

[00009] $\begin{matrix} L_{GBDT} (y^{j}, {\hat{y}}_{GBDT}^{j, 1}) = \frac{1}{2} {.Math.}_{n = 1}^{N} {({(e^{j, 0})}_{n} - {({\hat{y}}_{GBDT}^{j, 1})}_{n})}^{2} & (8) \end{matrix}$

[0054] where (ŷ.sub.GBDT.sup.j,1).sub.n is a predicted value of a nth sample in a jth training subset.

[0055] An output residual e.sup.j,1 of the first GBDT-based DXN sub-model f.sub.GBDT.sup.j,1(⋅) is calculated, which is expressed as follows:

[00010] $\begin{matrix} e^{j, 1} = e^{j, 0} - f_{GBDT}^{j, 1} (.Math.) = y^{i} - f_{RF}^{j} (.Math.) - f_{GBDT}^{j, 1} (.Math.) = y^{j} - {\hat{y}}_{RF}^{j} - {\hat{y}}_{GBDT}^{j, 1} & (9) \end{matrix}$

[0056] The e.sup.j,1 is taken as a true value of output data of a training subset of a second GBDT-based DXN sub-model f.sub.GBDT.sup.j,2(⋅). The second GBDT-based DXN sub-model is expressed as follows:

[00011] $\begin{matrix} {\hat{y}}_{GBDT}^{j, 2} = f_{GBDT}^{j, 2} ({{(x^{j, M^{j}})}_{n}}_{n = 1}^{N}, {{(e^{j, 1})}_{n}}_{n = 1}^{N})) & (10) \end{matrix}$

[0057] where (e.sup.j,1).sub.n is a predicted error of the first GBDT-based DXN sub-model of the nth sample.

[0058] The above processes are repeated. A ith (i≤I) GBDT-based DXN sub model is marked as f.sub.GBDT.sup.j,i(⋅) where an output residual of the ith GBDT-based DXN sub-model is expressed as follows:

[00012] $\begin{matrix} e^{j, i} = y^{j} - f_{RF}^{j} (.Math.) - f_{GBDT}^{j, 1} (.Math.) -, .Math., - f_{GBDT}^{j, i} (.Math.) = y^{j} - {\hat{y}}_{RF}^{j} - {\hat{y}}_{GBDT}^{j, 1} -, .Math., - {\hat{y}}_{GBDT}^{j, i} . & (11) \end{matrix}$

[0059] After I−1 iterations, a true value of output data of a training subset of a Ith GBDT-based DXN sub-model is expressed as follows:

[00013] $\begin{matrix} e^{j, I - 1} = y^{j} - {\hat{y}}_{RF}^{j} - {\hat{y}}_{GBDT}^{j, 1} -, .Math., - {\hat{y}}_{GBDT}^{j, i}, .Math., - {\hat{y}}_{GBDT}^{j, I - 1} & (12) \end{matrix}$

[0060] where ŷ.sub.GBDT.sup.j,I−1 is a predicted output of a (I−1)th GBDT-based DXN sub-model.

[0061] The Ith GBDT-based DXN sub-model is expressed as follows:

[00014] $\begin{matrix} {\hat{y}}_{GBDT}^{j, I} = f_{GBDT}^{j, I} ({{(x^{j, M^{j}})}_{n}}_{n = 1}^{N}, {{(e^{j, I - 1})}_{n}}_{n = 1}^{N})) & (13) \end{matrix}$

[0062] where (e.sup.j,I−1).sub.n is a predicted error of the (I−1)th GBDT-based DXN sub-model for the nth sample.

[0063] As a consequence, the I GBDT-based DXN sub-models based on the jth training subset are expressed as {f.sub.GBDT.sup.j,i(⋅)}.sub.i=1.sup.I, and an output of the I GBDT-based DXN sub-models is expressed as {ŷ.sub.GBDT.sup.j,i}.sub.i=1.sup.I,

Description of Work Process of Work Process of Simple Average-Based DXN Integrated Prediction Module

[0064] J RF-based DXN sub-models established in parallel are indicate as {f.sub.RF.sup.j(⋅)}.sub.j=1.sup.J. J×I GBDT-based DXN sub-models established in series and parallel simultaneously are indicated as

[00015] ${{(f_{GBDT}^{j, i} (.Math.))}_{i = 1}^{I}}_{j = 1}^{J} .$

[0065] For the jth training subset, one RF-based DXN sub-model and I GBDT-based DXN sub-models are established in parallel. A sum of a predicted output of the one RF-based DXN sub-model and the I GBDT-based DXN sub-models are taken as a total output of the jth training subset, which is expressed as follows:

[00016] $\begin{matrix} {\hat{y}}^{j} = {\hat{y}}_{RF}^{j} + {\hat{y}}_{GBDT}^{j, 1} +, .Math., + {\hat{y}}_{GBDT}^{j, i}, .Math., + {\hat{y}}_{GBDT}^{j, I - 1} = {\hat{y}}_{RF}^{j} + {.Math.}_{i = 1}^{I} {\hat{y}}_{GBDT}^{j, i} = f_{RF}^{j} (.Math.) + {.Math.}_{i = 1}^{I} f_{GBDT}^{j, i} (.Math.) . & (14) \end{matrix}$

[0066] Since the J training subsets are parallel, the one RF-based DXN sub-model is combined with the I pieces of GBDT-based DXN sub-model through simple average weighting method, where the prediction model f.sub.DXN (⋅) is expressed as follows:

[00017] $\begin{matrix} \hat{y} = f_{DXN} (.Math.) = \frac{1}{J} {.Math.}_{j = 1}^{J} {\hat{y}}^{j} = \frac{1}{J} {.Math.}_{j = 1}^{J} (f_{RF}^{j} (.Math.) + {.Math.}_{i = 1}^{I} f_{GBDT}^{j, i} (.Math.)) . & (15) \end{matrix}$

Description of Work Process of Prediction of DXN Emission Concentration Based on EnRFGBDT Method

[0067] The DXN emission concentration prediction model is established based on the training sample and input feature random sampling module, the RF-based DXN sub-model establishing module, the GBDT-based DXN sub-model establishing module and the simple average-based DXN integrated prediction module.

[0068] The process variable of the MSWI process, including furnace temperature, activated carbon injection amount, stack emission gas concentration, grate speed, primary air flow and secondary air flow, is taken as an input {x|x.sub.1, . . . , x.sub.m, . . . x.sub.M} of the DXN emission concentration prediction model. The input is calculated successively by the RF-based DXN sub-model establishing module, the GBDT-based DXN sub-model establishing module and the simple average-based DXN integrated prediction module, and a current DXN emission concentration value is as a DXN emission concentration predicted value of the MSWI process.

Experimental Verification

Modeling Data

[0069] The modeling data herein is the inspection data of the incinerator 1# and incinerator 2# of a MSWI power plant in Beijing in the past 6 years, including the process variable as the input data and the measured value of the DXN emission concentration as the output data. The process variable is obtained from 53 power generation systems, 115 public electrical systems, 14 waste heat boiler systems, 79 incineration systems, 20 flue gas treatment systems and 6 terminal detection systems. The DXN emission concentration is obtained by online collection and offline analysis, and an unit of the DXN emission concentration is ng/Nm.sup.3. ⅔ of the 67 samples (45 samples) are used as training data and ⅓ (22 samples) are used as testing data.

Modeling Experiment

[0070] A square error is taken as the loss function both in RF method and GBDT method. The number of the training sample is 45. A range of the number of input feature is [10,20,30,40,50,60,70,80,90,100]. A range of the iteration time of the GBDT is [1,2,3,4,5,6,7,8,9]. The minimum number of the training sample included in the leaf node of the CART is 3. An out-of-bag data (OOB) sampled by the Bootstrap algorithm is configured to perform model testing, with a root-mean-square error (RMSE) as an evaluation index.

[0071] For a DXN emission concentration prediction model based on RF, a relationship between the number of input feature and an OOB error is shown in the Table 1, where the number of the CART is 5 (expressed as an average of 50 experiments).

TABLE-US-00001 TABLE 1 OOB error under different number of input features The number of The number of CART input feature OOB error 5 5 1.254442 5 10 1.088729 5 15 1.071183 5 17 1.083559 5 20 1.186283 5 25 1.140331 5 30 1.254986

[0072] As shown in Table 1, the OOB error is minimum in the case of 15 input features. A relationship between the CART in the DXN emission concentration prediction model based on RF and the OOB error is shown in Table 2, where the number of the input features is unchanged. The experimental result is an average of 50 experiments.

TABLE-US-00002 TABLE 2 OOB error with different number of CART The number The number of input feature of CART OOB error 15 10 1.1185 15 20 1.0924 15 30 1.08139 15 40 1.0806 15 50 1.0972 15 60 1.1153 15 70 1.1128 15 80 1.1281 15 90 1.1248 15 100 1.1280

[0073] As shown in Table 2, the 00B error of the DXN emission concentration prediction model based on RF is minimum in the case of 40 CART, which is slightly higher than the minimum in the Table 1. Therefore, an optimization is required both in the number of the CART and the number of the input features to obtain a better prediction performance.

[0074] For a DXN emission concentration prediction model based on GBDT, a relationship between a loss function of square error and an iteration time is shown as Table 3.

TABLE-US-00003 TABLE 3 Relationship between loss function of square error and iteration time of DXN emission concentration prediction model based on GBDT Iteration time The number of CART Value of loss function 0 1 44.0000 1 2 3.6268 2 3 0.6641 3 4 0.1888 4 5 0.05339 5 6 0.02496 6 7 0.008141 7 8 0.003389 8 9 0.002034 9 10 0.001011

[0075] As shown in Table 3, the value of loss function is gradually decreased as the iteration time increases. A decreasing of the square error is slowed when the iteration time reaches 5. Accordingly, an appropriate iteration time is necessary for reducing a computing consumption.

[0076] Therefore, a preferable parameter herein is as follows: the number of the input feature is 10; the number of the CART is 5; and the number of the GBDT-based DXN sub-model (iteration time) is 5. Statistical results of training set and testing set based on different method is shown in Table 4. FIGS. 3 and 4 shows prediction curves of DXN emission concentration prediction models constructed respectively based on RF, GBDT and a combination thereof.

TABLE-US-00004 TABLE 4 Statistical results of DXN emission concentration prediction models constructed respectively based on RF, GBDT and a combination thereof Training set Test set Method RMSE RMSE RF 0.34060 0.03019 GBDT 0.02355 0.03529 EnRFGBDT 0.01478 0.02844

[0077] As shown in Table 4, FIG. 3 and FIG. 4, (1) in the testing set, the DXN emission concentration prediction models based on GBDT has the maximum predicted error which is 0.03529. A main reason is that the GBDT has taken all of process variables as input features of the DXN emission concentration prediction models based on GBDT, while the other two methods have reduced the number of the input feature. Indicatively, a selection of feature is necessary for high-dimensional process variable. (2) When the number of the CART is 5 and the number of the input features is 15, in the training set, the DXN emission concentration prediction models based on RF has the maximum predicted error which is 0.34060. In the testing set, the RMSE of the prediction model based on RF, which is 0.030199, is less than that of the prediction model based on GBDT, which is 0.035291. Indicatively, the generalization ability of RF method is stronger than that of the GBDT method. (3) The EnRFGBDT method herein has the best prediction performance both in training set and testing set, which indicates that the dimension of the input feature is reduced and the generalization ability of the prediction model by EnRFGBDT is improved simultaneously.

[0078] Based on the process variable of MSWI process, for a technical problem of detection of DXN emission concentration in real-time, a DXN emission concentration prediction model based on hybrid integration of RF and GBDT is provided. The prediction model herein has following novelty: the first DXN sub-model is established based on RF, and other multiple DXN sub-models are established based on GBDT. The dimension is reduced and the predicted error of the prediction model herein is reduced, simultaneously. Result of simulation experiments based on real data of the MSWI process indicates that the method for predicting herein has an outstanding prediction performance compared to prediction model merely based on RF or GBDT.

METHOD FOR PREDICTING DIOXIN EMISSION CONCENTRATION

Inventors

Cpc classification

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06N5/02

PHYSICS

Classification Explorer

G06N5/01

PHYSICS

Classification Explorer

G01N33/0049

PHYSICS

Classification Explorer

G06N20/20

PHYSICS

International classification

Classification Explorer

G06N20/20

PHYSICS

Classification Explorer

G06N5/02

PHYSICS

Abstract

Claims

Description