METHOD OF ANALYZING INFLUENCE FACTOR FOR PREDICTING CARBON DIOXIDE CONCENTRATION OF ANY SPATIOTEMPORAL POSITION

20230186173 · 2023-06-15

    Inventors

    Cpc classification

    International classification

    Abstract

    The disclosure provides a method of analyzing an influence factor for predicting a carbon dioxide concentration of any spatiotemporal position. Firstly, an atmospheric carbon dioxide spatiotemporal distribution simulation method is proposed. This simulation method constructs a simulation model simulating carbon dioxide concentration distribution of any position of a region based on machine learning algorithm in combination with carbon dioxide data of satellite observation and corresponding environmental factors; next, by use of a global sensitivity analysis method, quantitative evaluation on the importance of multiple influence factors for regional carbon dioxide distribution is achieved.

    Claims

    1. A method of analyzing an influence factor for predicting a carbon dioxide concentration of any spatiotemporal position, the method comprising: 1) in combination with regional environmental characteristics, classifying environmental factors affecting regional carbon dioxide distribution into a plurality of factors comprising ground coverage type factor, vegetation coverage factor, climate type factor, precipitation factor, atmospheric temperature factor, wind velocity and direction factors, anthropogenic emission amount factor, and biomass combustion emission factor; wherein, in 1), the vegetation coverage factor is from the L3 Normalized Difference Vegetation Index of the Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite; the ground coverage type factor is from the annual global land coverage data from European Space Agency; the climate type factor is from Köppen climate zoning dataset; the precipitation factor and the atmospheric temperature factor are from the Chinese 1 km-resolution monthly average precipitation and atmospheric temperature data from National Tibetan Plateau Data Center; the wind velocity and direction factors are from the wind velocity and direction data from ERAS dataset; the biomass combustion emission factor is from the anthropogenic emission amount from the high resolution global anthropogenic emission dataset ODIAC and biomass combustion emission amount data from the global fire disaster emission database GFED4; 2) in combination with OCO-2 satellite carbon dioxide observation data and the environmental factors, using eXtreme Gradient Boosting tree (XGBoost) machine learning algorithm to construct a Regional Carbon Dioxide Spatiotemporal distribution simulation (RCDS) model and training the simulation model using a training dataset; 3) for the constructed RCDS model, first using a test dataset to verify a model prediction accuracy, and then inputting environmental factor data without satellite observation into the trained carbon dioxide spatiotemporal distribution simulation model to obtain a predicted carbon dioxide concentration and finally obtaining a regional carbon dioxide concentration distribution graph; 4) in combination with the constructed regional carbon dioxide spatiotemporal distribution simulation model and a global sensitivity analysis method, calculating a sensitivity of the carbon dioxide concentration for each input environmental factor parameter; 5) counting the sensitivities of the regional carbon dioxide concentration for different environmental factors obtained by the global sensitivity analysis method, and quantitatively analyzing the size of the sensitivity of each parameter to finally determine an influence degree of each environmental factor along with the regional carbon dioxide distribution.

    2. The method of claim 1, wherein the machine learning algorithm used in 2) is eXtreme Gradient Boosting tree (XGBoost) which is a tree integration model based on gradient boost; the basic construction thinking of the XGBoost model is: firstly constructing an initial sub-tree to performing fitting for data to correspondingly obtain a fitting residue, and constructing subsequent sub-trees based on initial sub-tree fitting residue until the subsequent sub-tree fitting residue is less than a threshold, and the final simulation result is a sum of all sub-tree results; the specific construction steps are as follows: initially constructing a weak base learner to obtain a residue corresponding to an initial sub-tree model; for each subsequent training iteration, based on the existing sub-tree model, adding one weak learner to fit a residue of a previous sub-tree model; through continuous learning, fitting K weak learners to reduce the residue between a model prediction result and a true value until the residue is less than a threshold, and the model is terminated, finally the model prediction result is a result obtained by performing weighted summing using K base learners.

    3. The method of claim 1, wherein, the specific implementation of performing training using the training dataset in 2) is as follows: first performing preprocessing for the training dataset, comprising data cleaning, data encoding and data transformation, wherein the data cleaning comprises removal of missing value, abnormal value and noise, and the data transformation comprises normalization and dimension reduction; the data encoding is to encode non-numerical features and input into the model for training, encode the environmental factors comprising the ground vegetation type, the climate type and wind direction, by using one-hot encoding; performing normalization processing for the data in the following formula: z q = z q - mean ( z q ) std ( z q ) wherein mean(z.sub.q) is a mean value of data of environmental factor z.sub.q, and std(z.sub.q) is a standard deviation of the data of the environmental factor z.sub.q; next, inputting the preprocessed training dataset into the XGBoost model and performing parameter adjustment and further optimization for the XGBoost model, and repeating iterations to obtain an optimal carbon dioxide spatiotemporal distribution simulation model.

    4. The method of claim 2, wherein the base learner of the XGBoost model is CART tree, and for a dataset with m features of n samples D=(x.sub.i,y.sub.i)(|D|=n, x.sub.i∈R.sup.m, y.sub.i∈R), the final CART tree prediction value obtained by training is expressed below: y ^ i = φ ( x i ) = .Math. k = 1 K f k ( x i ) wherein K is a number of base learners, x.sub.i is an i-th sample, y.sub.i is a class label corresponding to the i-th sample, f.sub.x(⋅) is a model of a k-th tree; the k-th tree is split into a leave node q of the CART tree and a corresponding weight part ω, i.e.:
    f.sub.i(x.sub.i)=ω.sub.q(x.sub.i.sub.) wherein ω.sub.q(x.sub.i.sub.) is a weight of the leave node q where the sample x.sub.i is located, and q(x.sub.i) is a position of the leave node where the sample x.sub.i is located, that is, for any one sample x.sub.i, the weight at a particular leave node is valued as ω.sub.q(x.sub.i.sub.); for each iteration, the model fits the previous predicted residue and therefore, when a t-th base learner is generated, the prediction model is expressed as:
    ŷ.sub.i.sup.(t).sub.i.sup.(t-1)+f.sub.k.sup.(t)(x.sub.i) a target function is expressed as: L ( t ) = .Math. i = 1 n l ( y i , ( y ^ i ( t - 1 ) + f k ( t ) ( x i ) ) ) + Ω ( f k ( t ) ) wherein the target function contains two parts: a first part is function l(y.sub.i,(ŷ.sub.i.sup.(t-1)+f.sub.k.sup.(t)(x.sub.i))), which describes a difference between a true value and a fitting (t) value based on Euclidean distance; a second part is Ω(f.sub.k.sup.(t)), which is a regularized part for preventing function overfitting, i.e. Ω ( f k ( t ) ) = γ T + 1 2 λ .Math. j = 1 T ω j 2 used to limit the complexity of each tree and prevent model overfitting; T is a number of all leave nodes on the CART tree, γ and λ are hyperparameters used to adjust the number of leave nodes and importance distribution of the weight during regularized calculation, ω.sub.j is a weight value of a j-th leave node; to minimize the target function, the XGBoost considers performing second order Taylor expansion for the target function, which is approximately expressed as: L ( t ) .Math. i = 1 n [ l ( y i , y ^ i ( t - 1 ) ) + i f k ( t ) ( x i ) + 1 2 h i f k ( t ) 2 ( x i ) ] + γ T + 1 2 λ .Math. j = 1 T ω j 2 wherein g.sub.i is a first-order derivative, defined as i = y ^ i ( t - 1 ) l ( y i , y ^ i ( t - 1 ) ) h.sub.i is a second-order derivative h i = y ^ i ( t - 1 ) 2 l ( y i , y ^ i ( t - 1 ) ) , and the following result is obtained by substituting into the target function: L ( T ) .Math. j = 1 T [ ( .Math. i = 1 i ) ω j + 1 2 ( .Math. i = 1 h i + λ ) ω j 2 ] each iteration minimizes the target function to obtain j optimal leave nodes of the t-th base learner and an optimal solution ω.sub.j corresponding to each leave node.

    5. The method of claim 1, wherein the global sensitivity analysis method used in 4) is Sobol method, the sensitivity of which is calculated by decomposing an output total variance into a sum of a variance of each parameter and a variance of mutual interaction of parameters, and then performing sensitivity grading calculation based on a ratio of a contribution of the parameter to the output variance; for each environmental factor, a change range and a probability distribution are calculated and then a corresponding sensitivity index is calculated in combination with the regional carbon dioxide spatiotemporal distribution simulation model; the regional carbon dioxide spatiotemporal distribution simulation model is expressed as: y=f(x.sub.1′, x.sub.2′, . . . , x.sub.p′), wherein f is a trained XGBoost model, x.sub.1′, x.sub.2′, . . . , x.sub.p′ are environmental factors affecting carbon dioxide distribution and are input parameters of the XGBoost model; the total variance of the XGBoost model is:
    D=∫f.sup.2(x′)dx′−f.sub.0.sup.2 wherein, f.sub.0 is an initial value of the XGBoost model and a partial variance of the XGBoost model is:
    D.sub.π.sub.1.sub.,π.sub.2.sub., . . . ,π.sub.s=∫ . . . ∫(x.sub.π.sub.1′,x.sub.π.sub.2′, . . . ,x.sub.π.sub.s′)dx.sub.π.sub.1′,x.sub.π.sub.2′, . . . ,x.sub.π.sub.s wherein, 1≤π.sub.1< . . . <π.sub.s≤p, and s=1, 2, . . . , p and the sensitivity S.sub.π.sub.1.sub.,π.sub.2.sub., . . . ,π.sub.s of each environmental factor: S π 1 , π 2 , .Math. , π s = D π 1 , π 2 , .Math. , π s D wherein S.sub.π.sub.1 is a first-order sensitivity index of the environmental factor x.sub.π.sub.1′, which is used to represent an influence of the environmental factor x.sub.π.sub.1′ on the model output, S.sub.π.sub.1.sub.,π.sub.2.sub., . . . ,π.sub.s is an s-order sensitivity index of the environmental factors x.sub.π.sub.1′, x.sub.π.sub.2′, . . . , x.sub.π.sub.s′, which is used to represent a joint influence of s environmental factors on the model; further, a total sensitivity index of each environmental factor is obtained, and the total sensitivity index TS.sub.π of the environmental factor x.sub.π.sub.s′ is defined as:
    TS.sub.π=S.sub.π.sub.1+S.sub.π.sub.1.sub.,π.sub.2+ . . . +S.sub.π.sub.1.sub.,π.sub.2.sub., . . . ,π.sub.s the total sensitivity index of each environmental factor obtained by Sobol method is used to evaluate the final sensitivity of the influence factors affecting the regional carbon dioxide distribution, achieving quantitative influence degree analysis.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0042] FIG. 1 is a flowchart illustrating a general method of an embodiment of the disclosure;

    [0043] FIG. 2A is a regional carbon dioxide distribution graph of satellite carbon dioxide observation data and FIG. 2B is modeling retrieval according to an embodiment of the disclosure; and

    [0044] FIGS. 3A-3B are sector graph of a sensitivity index of an influence factor of an embodiment of the disclosure.

    DETAILED DESCRIPTION

    [0045] To describe the technical solution and technical advantages of the disclosure in more details, the disclosure will be fully described below in combination with specific embodiments and accompanying drawings.

    [0046] As shown in FIG. 1, the disclosure provides a method of analyzing an influence factor for predicting a carbon dioxide concentration of any spatiotemporal position. The method generally comprises two parts. In a first part, regional carbon dioxide simulation modeling is performed based on machine learning algorithm to achieve simulation for a region without satellite observation carbon dioxide data so as to obtain a carbon dioxide spatiotemporal distribution mode of the entire region; in a second part, according to a trained regional carbon dioxide spatiotemporal distribution simulation model, in combination with a global sensitivity analysis method, an importance degree of an environmental factor affecting regional carbon dioxide distribution is quantized. The specific implementation process is described below.

    [0047] 1. The specific steps of the regional carbon dioxide simulation modeling method based on machine learning algorithm are described below.

    [0048] At step 1, environmental factor data affecting regional carbon dioxide distribution are collected, including but not limited to ground coverage type, vegetation coverage, climate type, precipitation, atmospheric temperature, wind velocity and direction, anthropogenic emission amount statistic data, and biomass combustion emission amount of a region, and then matched with the satellite observation carbon dioxide data to obtain training and verification datasets of a machine learning model.

    [0049] The vegetation coverage is represented by normalization vegetation index data which may be obtained from the L3 vegetation index product of the MODIS satellite; the anthropogenic emission statistics come from the high resolution global anthropogenic emission dataset ODIAC; the biomass combustion data comes from global fire disaster emission database GFED4; atmospheric temperature and precipitation data comes from Chinese 1 km-resolution monthly average atmospheric temperature dataset provided by National Tibetan Plateau Data Center; the ground coverage data comes from annual global land coverage dataset published by European Space Agency, the climate type data comes from Köppen climate zoning dataset, and the wind velocity and direction data comes from ERAS dataset.

    [0050] At step 2, a machine learning algorithm is selected to construct a carbon dioxide distribution simulation model and the model is trained in combination with environmental factors and the satellite carbon dioxide training dataset.

    [0051] The specific steps of performing training are as follows: preprocessing the training dataset, comprising data cleaning (removal of missing value, abnormal value and noise and the like), data encoding and data transformation (normalization and dimension reduction and the like) and so on.

    [0052] For the processing of the missing value of the dataset, in a case of less missing values, it is considered to delete the sample.

    [0053] For the processing for abnormal value and noise, the noise is firstly detected by statistic characteristics of data or clustering method, and then the data is “smoothed” by using a method such as binning, clustering, regression, and combination of computer check and manual check to remove the abnormal values and noise in the data.

    [0054] The data encoding is mainly to encode the non-numerical features and input them into the model for training. In this experiment, it is mainly required to encode the environmental factors such as ground coverage type, climate type and wind direction, by using one-hot encoding.

    [0055] Data preprocessing also requires normalization processing for the data in the following formula:

    [00010] z q = z q - mean ( z q ) s t d ( z q )

    [0056] where mean(z.sub.q) is a mean value of data of environmental factor z.sub.q, and std(z.sub.q) is a standard deviation of the data of the environmental factor z.sub.q.

    [0057] Furthermore, the machine learning algorithm used in step 2 is eXtreme Gradient Boosting tree (XGBoost) which is a tree integration model based on gradient boost, wherein the basic construction thinking of the model is: firstly constructing an initial sub-tree to performing fitting for data to correspondingly obtain a fitting residue, and constructing subsequent sub-trees based on previous model residue until the model residue is less than a threshold, and the final simulation result is a sum of all sub-tree results; the specific construction steps are as follows:

    [0058] initially constructing a weak learner to obtain a residue corresponding to an initial model;

    [0059] for each subsequent training iteration, based on the existing model, adding one weak learner to fit a residue of a previous model;

    [0060] through continuous learning, fitting K weak learners to reduce the residue between a model prediction result and a true value until the residue is less than a threshold, and the model is terminated where the final model prediction value is a result obtained by performing weighted summing using K base learners.

    [0061] Furthermore, the base learner of the XGBoost model is CART tree, and for a dataset with m features of n samples D=(x.sub.i,y.sub.i)(|D|=n,x.sub.i∈R.sup.m,y.sub.i∈R), the final prediction value obtained by training is expressed below:

    [00011] y ˆ i = φ ( x i ) = .Math. k = 1 K f k ( x i )

    [0062] wherein K is a number of base learners, x.sub.i is an i-th sample, y.sub.i is a class label corresponding to the i-th sample, f.sub.k(⋅) is a model of a k-th tree, wherein the k-th tree is split into a leave node q of the tree and a corresponding weight part co, i.e.:


    f.sub.i(x.sub.i)=ω.sub.q(x.sub.i.sub.)

    [0063] wherein ω.sub.g(x.sub.i.sub.) is a weight of the leave node q where the sample x.sub.i is located, and q(x.sub.i) is a position of the leave node where the sample x.sub.i is located, that is, for any one sample x.sub.i, the weight at a particular leave node is valued as ω.sub.q(x.sub.i.sub.);

    [0064] for each iteration, the model fits the previous predicted residue and therefore, when a t-th base learner is generated, the prediction model is expressed as:


    ŷ.sub.i.sup.(t).sub.i.sup.(t-1)+f.sub.k.sup.(t)(x.sub.i)

    [0065] a target function is expressed as:

    [00012] L ( t ) = .Math. i = 1 n l ( y i , ( y ^ i ( t - 1 ) + f k ( t ) ( x i ) ) ) + Ω ( f i ( t ) )

    [0066] wherein the target function is composed of two parts: in a first part, function l(⋅,⋅) describes a difference between a true value and a fitting value, which is calculated based on Euclidean distance; the second part is a regularized part Ω(f.sub.k.sup.(t)) for preventing function overfitting, i.e.

    [00013] Ω ( f k ( t ) ) = γ T + 1 2 λ .Math. j = 1 T ω j 2

    used to limit the complexity of each tree and prevent model overfitting, wherein T is a number of all leave nodes on the CART tree, γ and λ are hyperparameters used to adjust the number of leave nodes and importance distribution of the weight during regularized calculation, ω.sub.j is a weight value of a j-th leave node; to minimize the target function, the XGBoost considers performing second order Taylor expansion for the target function, which is approximately expressed as:

    [00014] L ( t ) .Math. i = 1 n [ l ( y i , y ^ i ( t - 1 ) ) + g i f k ( t ) ( x i ) + 1 2 h i f i ( t ) 2 ( x i ) ] + γ T + 1 2 λ .Math. j = 1 T ω j 2

    [0067] wherein g.sub.i is a first-order derivative, defined as

    [00015] i = y ^ i ( t - 1 ) l ( y i , y ^ i ( t - 1 ) )

    h.sub.i is a second-order derivative

    [00016] h i = y ^ i ( t - 1 ) 2 l ( y i , y ^ i ( t - 1 ) ) ,

    and the following result is obtained by substituting into the target function:

    [00017] L ( t ) .Math. j = 1 T [ ( .Math. i = 1 i ) ω j + 1 2 ( .Math. i = 1 h i + λ ) ω j 2 ] + λ T

    [0068] Each iteration minimizes the target function to obtain j optimal leave nodes of the t-th base learner and an optimal solution ω.sub.j corresponding to each leave node.

    [0069] The preprocessed training dataset is input into the XGBoost model and parameter adjustment and further optimization are performed for the XGBoost model, and iterations are repeated to obtain an optimal carbon dioxide distribution simulation model.

    [0070] At step 3, for the constructed carbon dioxide distribution simulation model, a test dataset is firstly used to verify a model prediction accuracy, and then environmental factor data without satellite observation is input into the trained carbon dioxide distribution simulation model to obtain a predicted carbon dioxide concentration and finally, a regional carbon dioxide concentration spatiotemporal distribution is obtained.

    [0071] 2. According to the above trained regional carbon dioxide spatiotemporal distribution simulation model and the global sensitivity analysis method, the importance of the influence factors is quantitatively analyzed, comprising the following steps.

    [0072] At step 4, in combination with the constructed regional carbon dioxide spatiotemporal distribution simulation model and the global sensitivity analysis method, a sensitivity of the carbon dioxide distribution for each environmental factor is calculated.

    [0073] At step 5, the sensitivities of the regional carbon dioxide concentration for different environmental factors obtained by the global sensitivity analysis method are counted, and the size of the sensitivity of each parameter is quantitatively analyzed to finally determine an influence degree of each environmental factor along with the regional carbon dioxide distribution.

    [0074] The global sensitivity analysis method used in step 4 is Sobol method which is performed in the following step:

    [0075] for each environmental factor, a change range and a probability distribution are calculated and then a corresponding sensitivity index is calculated in combination with the regional carbon dioxide spatiotemporal distribution simulation model.

    [0076] The regional carbon dioxide spatiotemporal distribution simulation model is expressed as: y=f(x.sub.1′,x.sub.2′, . . . , x.sub.p′), wherein f is a trained XGBoost model, x.sub.1′,x.sub.2′, . . . , x.sub.p′ are environmental factors affecting carbon dioxide distribution and are input parameters of the XGBoost model and n is a number of model parameters, i.e. the 9 influence factors in step 1; the total variance of the XGBoost model is:


    D=∫f.sup.2(x′)dx′−f.sub.0.sup.2

    [0077] wherein, f.sub.0 is an initial value of the model and the a partial variance of the model is:


    D.sub.π.sub.1.sub.,π.sub.2.sub., . . . ,π.sub.s=∫ . . . ∫(x.sub.π.sub.1′,x.sub.π.sub.2′, . . . ,x.sub.π.sub.s′)dx.sub.π.sub.1′,x.sub.π.sub.2′, . . . ,x.sub.π.sub.s

    [0078] wherein, 1≤π.sub.1< . . . <π.sub.s≤p, and s=1, 2, . . . , p and the sensitivity S.sub.π.sub.1.sub.,π.sub.2.sub., . . . , π.sub.s of each environmental factor:

    [00018] S π 1 , π 2 , .Math. , π s = D π 1 , π 2 , .Math. , π s D

    [0079] wherein S.sub.π.sub.1 is a first-order sensitivity index of the environmental factor x.sub.π.sub.1′, which is used to represent an influence of the parameter on the model output, S.sub.π.sub.1.sub.,π.sub.2.sub., . . . , π.sub.s is an s-order sensitivity index of the environmental factors x.sub.π.sub.1′,x.sub.π.sub.2′, . . . ,x.sub.π.sub.s′, which is used to represent a joint influence of s parameters on the model;

    [0080] further, a total sensitivity index of each environmental factor is obtained, and the total sensitivity index TS.sub.π of the environmental factor x.sub.π.sub.s′ is defined as:


    TS.sub.π=S.sub.π.sub.1+S.sub.π.sub.1.sub.,π.sub.2+ . . . +S.sub.π.sub.1.sub.,π.sub.2.sub., . . . ,π.sub.s

    [0081] In step 5, the total sensitivity index of each environmental factor obtained by Sobol method is used to evaluate the final sensitivity of the influence factors affecting the regional carbon dioxide distribution, achieving quantitative influence degree analysis.

    3. Embodiment

    [0082] In this embodiment of the present disclosure, by using OCO-2 satellite XCO2 observation data and corresponding environmental factors of 2016 and the XGBoost modeling, the CO.sub.2 concentration distribution in the eastern region of China is simulated. FIGS. 2A-2B show a result of satellite observation data and modeling retrieval. For accuracy evaluation on the simulation model constructed using machine learning algorithm, a determination coefficient R2 and a root mean square error RMSE are used and a final modeling accuracy obtained after parameter adjustment and optimization is as shown in Table 1.

    TABLE-US-00001 TABLE 1 Modeling accuracy Training samples Test samples R2 RMSE 3153 (70%) 1351 (30%) 0.6751 1.6362 ppm

    [0083] By using the global sensitivity analysis method and the constructed carbon dioxide simulation model, quantitative evaluation is performed for the sensitivities of the influence factors to obtain the results as shown in Table 2.

    TABLE-US-00002 TABLE 2 a first order sensitivity index and a total sensitivity index of each environmental factor estimated using the global sensitivity analysis method First order Total sensitivity Environmental factors sensitivity index index Ground coverage type 0.013060 0.015529 Vegetation coverage 0.300257 0.320699 Climate type 0.006008 0.007367 Precipitation 0.291814 0.301615 Atmospheric temperature 0.262991 0.277399 Wind velocity and direction 0.713833 0.727576 Anthropogenic emission 0.000197 0.000208 amount Biomass combustion emission 0.000915 0.001157

    [0084] To more visually display the sizes of the sensitivities of different environmental factors on the total carbon dioxide distribution, a sector graph of sensitivity indexes is drawn to determine ratios of the influence factors as shown in FIGS. 3A-3B.

    [0085] As shown in FIGS. 2A-2B, the environmental factors, i.e. wind velocity and direction, vegetation, precipitation, atmospheric temperature, ground coverage type, climate type, biomass combustion emission and anthropogenic emission, are sorted in a descending order of sensitivity size, where the indexes of the wind velocity and direction, vegetation, precipitation and atmospheric temperature are large, which indicates that they are major factors affecting regional carbon dioxide distribution.

    [0086] As known from the model accuracy, it is feasible to simulate the regional carbon dioxide spatiotemporal distribution by using model. The method provided by the disclosure can fill in the gap of satellite observation data by simulating the regional carbon dioxide concentration spatiotemporal distribution with the environmental factors. Further, a method of quantitatively evaluating the influence degrees of the environmental factors on the regional carbon dioxide distribution is proposed so as to determine the influence sizes and specific degrees of various environmental factors on the regional carbon dioxide distribution.

    [0087] The specific embodiments described in the disclosure are merely illustrated based on the spirit of the disclosure. Those skilled in the art can make various changes or supplementations or similar replacements to the specific embodiments described herein without departing from the spirit of the disclosure or the scope defined by the appended claims.