SYSTEM AND METHOD FOR PERFORMING INDUSTRIAL PROCESSES ACROSS FACILITIES
20200182847 ยท 2020-06-11
Assignee
Inventors
Cpc classification
G06F17/15
PHYSICS
G06F17/18
PHYSICS
G06N7/00
PHYSICS
International classification
G01N33/00
PHYSICS
G06F17/15
PHYSICS
G06F17/18
PHYSICS
Abstract
A system and method is described herein for performing at least one industrial process at each facility of a plurality of facilities based on an industrial process standard generated by reducing functional, and trend line outlier bias in data of one or more process parameters as measured by one or more sensors. Outliers are removed from the data set through an objective method. Bias is determined based on absolute, relative error, or both. Error values are computed from the data, model coefficients, or trend line estimates. Outlier data records are removed when the error values are greater than or equal to one or more criteria.
Claims
1. A system, comprising: a computing unit configured to execute a computer program that is stored by a non-transient storage subsystem, wherein the computer program, when executed by at least one processor, causes the at least one processor of the computing unit to at least: receive the corresponding current process-related data set for each target variable of one or more target variables of at least one process parameter of at least one industrial process being performed at a particular facility of a plurality of facilities; wherein the one or more target variables are measured by at least one sensor at each facility of the plurality of facilities; generate a random process data set from the corresponding current process-related data set; perform a dynamic outlier bias reduction on the corresponding current process-related data set one or more bias criteria values of a set of bias criteria values to generate one or more outlier bias reduced, process-related target data set; perform the dynamic outlier bias reduction on the process-related random data set for the one or more bias criteria values of the set of bias criteria values to generate one or more outlier bias reduced, process-related random data set; generate a first bias criteria curve for the corresponding current process-related data set and a second bias criteria curve for the process-related random data set from the one or more bias criteria values, a set of target error values, a set of random error values, a set of target correlation coefficients, and a set of random correlation coefficients; dynamically determine, based on the first bias criteria curve and the second bias criteria curve, a non-biased viability of the corresponding current process-related data set; wherein the non-biased viability is an indicator of whether the corresponding current process-related target data set is representative of the at least one process parameter of the at least one industrial process being modeled by the model; and dynamically adjust the at least one industrial process based at least in part on at least one industrial process standard according to the non-biased viability.
2. The system of claim 1, wherein the computer program further causes the computing unit to display a plot for the first bias criteria curve and the second bias criteria curve.
3. The system of claim 1, wherein the at least one measuring sensor is configured to detect and quantify at least one compound based on the one or more target variables.
4. The system of claim 3, wherein the at least one compound is a greenhouse chemical gas compound, and wherein the at least one measuring sensor is further configured to detect and quantify the at least one compound continuously.
5. The computer system of claim 1, wherein the computer program further causes the at least one processor to perform the dynamic outlier bias reduction on the corresponding current process-related data set for the one or more bias criteria values of the set of bias criteria values to generate the one or more outlier bias reduced, process-related target data set, by performing at least: for each of the one or more bias criteria values: apply the model to the corresponding current process-related data set to generate a plurality of model predicted values for the corresponding current process-related data set; compute a plurality of error values determined from the corresponding current process-related data set and the model predicted values; determine the one or more process-related outliers within the corresponding current process-related data set to form a modified, process-related target data set based on the error values and corresponding bias criteria values; and generate an updated model determined from the modified, process-related target data set.
6. The computer system of claim 5, wherein the computer program further causes the at least one processor to perform the dynamic outlier bias reduction on the corresponding current process-related data set for the one or more bias criteria values of the set of bias criteria values to generate the one or more outlier bias reduced, process-related target data set, by performing at least: for each of the one or more bias criteria values: generate a plurality of second model predicted values for the corresponding current process-related data set by applying the updated model to the corresponding current process-related data set.
7. The computer system of claim 1, wherein the computer program further causes the at least one processor to dynamically determine the non-biased viability by performing at least: determine a first bias criteria value on the first bias criteria curve that corresponds to a first target error value of the set of target error values; determine a second bias criteria value on the second bias criteria curve that corresponds to a first random error value of the set of random error values; and determine the non-biased viability based on the first bias criteria value and the second bias criteria value, wherein the first target error value and the first random error value are the same.
8. The computer system of claim 5, wherein the computer program further causes the computing unit to determine an influence of the removal of the one or more process-related outliers from the corresponding current process-related target data set for each bias criteria value based on the updated model and the set of target correlation coefficients.
9. The computer system of claim 1, wherein the process-related random data set comprises all random data values based on the corresponding current process-related data set, and wherein the computer program further causes the at least one processor to perform the dynamic outlier bias reduction on the process-related random data set for the one or more bias criteria values of the set of bias criteria values to generate the one or more outlier bias reduced, process-related random data set by performing at least: for each of the bias criteria values: apply the model to the random data set to generate a plurality of model predicted values for the process-related random data set; compute a plurality of error values using the process-related random data set and the model predicted values; determine the one or more process-related outliers within the process-related random data set to form the corresponding outlier bias reduced, process-related random data set determined based on the error values and the corresponding bias criteria value.
10. The computer system of claim 1, wherein at least one of the set of target error value is a standard error, and wherein at least one of the set of target correlation value is a coefficient of determination value.
11. The computer system of claim 1, wherein the process-related random data set comprises a plurality of random data values generated within a range of a plurality of predicted values of the model.
12. A method, comprising: receiving by a computer unit, a current process-related data set for each respective target variable of at least one process parameter of the at least one industrial process being performed at a particular facility of a plurality of facilities during a particular time; wherein the current process-related data set has been generated by at least one measuring sensor at the particular facility, wherein the at least one measuring sensor is configured to: i) measure, based on a model for the at least one industrial process, one or more target variables; and ii) generate the current process-related data set for each respective target variable; generating, by the computer unit, a random process data set from the current process-related data set; obtaining, by the computer unit, a set of bias criteria values used to determine one or more process-related outliers; performing, by the computer unit, a dynamic outlier bias reduction on the current process-related data set for one or more bias criteria values of the set of bias criteria values to generate one or more outlier bias reduced, process-related target data set; performing, by the computer unit, the dynamic outlier bias reduction on the process-related random data set for the one or more bias criteria values of the set of bias criteria values to generate one or more outlier bias reduced, process-related random data set; calculating, by the computer unit, a set of target error values for the one or more outlier bias reduced, process-related target data set and a set of random error values for the one or more outlier bias reduced, process-related random data set; calculating, by the computer unit, a set of target correlation coefficients for the one or more outlier bias reduced, process-related target data sets and a set of random correlation coefficients for the one or more outlier bias reduced, process-related random data set; generating, by the computer unit, a first bias criteria curve for the current process-related data set and a second bias criteria curve for the process-related random data set from the one or more bias criteria values, the set of target error values, the set of random error values, the set of target correlation coefficients, and the set of random correlation coefficients; dynamically determining, by the computer unit, based on the first bias criteria curve and the second bias criteria curve, a non-biased viability of the current process-related data set; wherein the non-biased viability is an indicator of whether the current process-related target data set is representative of the at least one process parameter of the at least one industrial process being modeled by the model; and dynamically generating, by the computer unit, when the non-biased viability identifies that the current process-related target data set is representative of the at least one process parameter of the at least one industrial process, at least one industrial process standard; and continuing to perform the at least one industrial process at each facility of the plurality of facilities based at least in part on the at least one industrial process standard.
13. The method of claim 12, wherein the method further comprises: causing, by the computer unit, to display a plot for the first bias criteria curve and the second bias criteria curve.
14. The method of claim 12, wherein the at least one measuring sensor is configured to detect and quantify at least one compound based on the one or more target variables.
15. The method of claim 14, wherein the at least one compound is a greenhouse chemical gas compound, and wherein the at least one measuring sensor is further configured to detect and quantify the at least one compound continuously.
16. The method of claim 12, wherein the method further comprises: for each of the one or more bias criteria values: applying, by the computer unit, the model to the current process-related data set to generate a plurality of model predicted values for the current process-related data set; computing, by the computer unit, a plurality of error values determined from the current process-related data set and the model predicted values; determining, by the computer unit, the one or more process-related outliers within the current process-related data set to form a modified, process-related target data set based on the error values and bias criteria values; and generate an updated model determined from the modified, process-related target data set.
17. The method of claim 16, wherein the method further comprises: for each of the one or more bias criteria values: generating, by the computer unit, a plurality of second model predicted values for the current process-related data set by applying the updated model to the current process-related data set.
18. The method of claim 12, wherein the method further comprises: determining, by the computer unit, a first bias criteria value on the first bias criteria curve that corresponds to a first target error value of the set of target error values; determining, by the computer unit, a second bias criteria value on the second bias criteria curve that corresponds to a first random error value of the set of random error values; and determining, by the computer unit, the non-biased viability based on the first bias criteria value and the second bias criteria value, wherein the first target error value and the first random error value are the same.
19. The method of claim 16, wherein the method further comprises: determining, by the computer unit, an influence of the removal of the one or more process-related outliers from the current process-related target data set for each bias criteria value based on the updated model and the set of target correlation coefficients.
20. The method of claim 12, wherein the process-related random data set comprises all random data values based on the current process-related data set, and wherein the method further comprises: for each of the bias criteria values: applying, by the computer unit, the model to the random data set to generate a plurality of model predicted values for the process-related random data set; computing, by the computer unit, a plurality of error values using the process-related random data set and the model predicted values; determining, by the computer unit, the one or more process-related outliers within the process-related random data set to form the outlier bias reduced, process-related random data set determined based on the error values and the bias criteria value.
21. The method of claim 12, wherein at least one of the set of target error value is a standard error, and wherein at least one of the set of target correlation value is a coefficient of determination value.
22. The method of claim 12, wherein the process-related random data set comprises a plurality of random data values generated within a range of a plurality of predicted values of the model.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
DETAILED DESCRIPTION OF THE INVENTION
[0032] The following disclosure provides many different embodiments, or examples, for implementing different features of a system and method for accessing and managing structured content. Specific examples of components, processes, and implementations are described to help clarify the invention. These are merely examples and are not intended to limit the invention from that described in the claims. Well-known elements are presented without detailed description so as not to obscure the preferred embodiments of the present invention with unnecessary detail. For the most part, details unnecessary to obtain a complete understanding of the preferred embodiments of the present invention have been omitted inasmuch as such details are within the skills of persons of ordinary skill in the relevant art.
[0033] A mathematical description of one embodiment of Dynamic Outlier Bias Reduction is shown as follows:
Nomenclature
[0034] {circumflex over (X)}Set of all data records: {circumflex over (X)}={circumflex over (X)}.sub.k+{circumflex over (X)}.sub.Ck, where: [0035] {circumflex over (X)}.sub.kSet of accepted data records for the k.sup.th iteration [0036] {circumflex over (X)}.sub.CkSet of outlier (removed) data records for the k.sup.th iteration [0037] {circumflex over (Q)}.sub.kSet of computed model predicted values for {circumflex over (X)}.sub.k [0038] {circumflex over (Q)}.sub.CkSet of outlier model predicted values for data records .sub.Ck [0039] Set of actual values (target values) on which the model is based [0040] {circumflex over ()}.sub.k.fwdarw.k+1Set of model coefficients at the k+1.sup.st iteration computed as a result of the model computations using {circumflex over (X)}.sub.k [0041] M({circumflex over (X)}.sub.k: {circumflex over ()}.sub.k.fwdarw.k+1)Model computation producing {circumflex over (Q)}.sub.k+1 from {circumflex over (X)}.sub.k storing model derived and user-supplied coefficients: {circumflex over ()}.sub.k.fwdarw.k+1 [0042] CUser supplied error criteria (%) [0043] (
.sub.k,
Error threshold function [0044] F (, C)Error threshold value (E) [0045] {circumflex over ()}.sub.kIteration termination criteria, e.g., iteration count, r.sup.2, standard error, etc.
Initial Computation, k=0
Initial Step 1: Using initial model coefficient estimates, {circumflex over ()}.sub.0.fwdarw.1, compute initial model predicted values by applying the model to the complete data set:
{circumflex over (Q)}.sub.1=M({circumflex over (X)}:{circumflex over ()}.sub.0.fwdarw.1)
Initial Step 2: Compute initial model performance results:
{circumflex over ()}.sub.1=f({circumflex over (Q)}.sub.1,,k=0,r.sup.2, standard error, etc.)
Initial Step 3: Compute model error threshold value(s):
E.sub.1=F(,
,C)
Initial Step 4: Filter the data records to remove outliers:
={x{circumflex over (X)}|({circumflex over (Q)}.sub.1,
<E.sub.1}
[0046] Iterative Computations, k>0
Iteration Step 1: Compute predicted values by applying the model to the accepted data set:
{circumflex over (Q)}.sub.k+1=M(:{circumflex over ()}.sub.k.fwdarw.k+1)
Iteration Step 2: Compute model performance results:
.sub.k+1=f({circumflex over (Q)}.sub.k+1,,k,r.sup.2, standard error, etc.)
If termination criteria are achieved, stop, otherwise proceed to Step 3:
Iteration Step 3: Compute results for removed data, .sub.Ck={x{circumflex over (X)}|x.Math.
} using current model:
.sub.Ck+1=M(
.sub.Ck:{circumflex over ()}.sub.k.fwdarw.k+1)
Iteration Step 4: Compute model error threshold values:
E.sub.k+1=F((.sub.k+1+
.sub.Ck+1,
),C)
Iteration Step 5: Filter the data records to remove outliers:
.sub.k+1={x{circumflex over (X)}|(
.sub.k+1+
.sub.Ck+1,
<E.sub.k+1}
[0047] Another mathematical description of one embodiment of Dynamic Outlier Bias Reduction is shown as follows:
Nomenclature
[0048] {circumflex over (X)}Set of all data records: {circumflex over (X)}={circumflex over (X)}.sub.k+{circumflex over (X)}.sub.Ck, where: [0049] {circumflex over (X)}.sub.kSet of accepted data records for the k.sup.th iteration [0050] {circumflex over (X)}.sub.CkSet of outlier (removed) data records for the k.sup.th iteration [0051] .sub.kSet of computed model predicted values for {circumflex over (X)}.sub.k [0052]
.sub.CkSet of outlier model predicted values for {circumflex over (X)}.sub.Ck; [0053] Set of actual values (target values) on which the model is based [0054] {circumflex over ()}.sub.k.fwdarw.k+1Set of model coefficients at the k+1.sup.st iteration computed as a result of the model computations using {circumflex over (X)}.sub.k [0055] M({circumflex over (X)}.sub.k: {circumflex over ()}.sub.k.fwdarw.k+1)Model computation producing {circumflex over (Q)}.sub.k+1 from {circumflex over (X)}.sub.k storing model derived and user-supplied coefficients: {circumflex over ()}.sub.k.fwdarw.k+1 [0056] C.sub.REUser supplied relative error criterion (%) [0057] C.sub.AEUser supplied absolute error criterion (%) [0058] RE({circumflex over (Q)}.sub.k+{circumflex over (Q)}.sub.Ck, )Relative error values for all data records [0059] AE({circumflex over (Q)}.sub.k+{circumflex over (Q)}.sub.Ck, )Absolute error values for all data records [0060] P.sub.RE.sub.
P.sub.RE.sub.
P.sub.AE.sub.
Initial Computation, k=0 [0063] Initial Step 1: Using initial model coefficient estimates, {circumflex over ()}.sub.0.fwdarw.1, compute initial model predicted value results by applying the model to the complete data set:
{circumflex over (Q)}.sub.1=M({circumflex over (X)}:{circumflex over ()}.sub.0.fwdarw.1) [0064] Initial Step 2: Compute initial model performance results:
{circumflex over ()}.sub.1=f({circumflex over (Q)}.sub.1,,k=0,r.sup.2, standard error, etc.) [0065] Initial Step 3: Compute model error threshold values:
P.sub.RE.sub.
P.sub.AE.sub.
[0067] Iterative Computations, k>0 [0068] Iteration Step 1: Compute model predicted values by applying the model to the outlier removed data set:
{circumflex over (Q)}.sub.k+1=M({circumflex over (X)}.sub.k:{circumflex over ()}.sub.k.fwdarw.k+1) [0069] Iteration Step 2: Compute model performance results:
.sub.k+1=f({circumflex over (Q)}.sub.k+1,,k,r.sup.2, standard error, etc.)
[0070] If termination criteria are achieved, stop, otherwise proceed to Step 3: [0071] Iteration Step 3: Compute results for the removed data, {circumflex over (X)}.sub.Ck={x{circumflex over (X)}|x.Math.{circumflex over (X)}.sub.k} using current model:
{circumflex over (Q)}.sub.k+1=M({circumflex over (X)}.sub.Ck:{circumflex over ()}.sub.k.fwdarw.k+1) [0072] Iteration Step 4: Compute model error threshold values:
P.sub.RE.sub.
P.sub.AE.sub.
[0074] Increment k and proceed to Iteration Step 1.
[0075] After each iteration where new model coefficients are computed from the current censored dataset, the removed data from the previous iteration plus the current censored data are recombined. This combination encompasses all data values in the complete dataset. The current model coefficients are then applied to the complete dataset to compute a complete set of predicted values. The absolute and relative errors are computed for the complete set of predicted values and new bias criteria percentile threshold values are computed. A new censored dataset is created by removing all data values where the absolute or relative errors are greater than the threshold values and the nonlinear optimization model is then applied to the newly censored dataset computing new model coefficients. This process enables all data values to be reviewed every iteration for their possible inclusion in the model dataset. It is possible that some data values that were excluded in previous iterations will be included in subsequent iterations as the model coefficients converge on values that best fit the data.
[0076] In one embodiment, variations in GHG emissions can result in overestimation or underestimation of emission results leading to bias in model predicted values. These non-industrial influences, such as environmental conditions and errors in calculation procedures, can cause the results for a particular facility to be radically different from similar facilities, unless the bias in the model predicted values is removed. The bias in the model predicted values may also exist due to unique operating conditions.
[0077] The bias can be removed manually by simply removing a facility's data from the calculation if analysts are confident that a facility's calculations are in error or possess unique, extenuating characteristics. Yet, when measuring a facility performance from many different companies, regions, and countries, precise a priori knowledge of the data details is not realistic. Therefore any analyst-based data removal procedure has the potential for adding undocumented, non-data supported biases to the model results.
[0078] In one embodiment, Dynamic Outlier Bias Reduction is applied to a procedure that uses the data and a prescribed overall error criteria to determine statistical outliers that are removed from the model coefficient calculations. This is a data-driven process that identifies outliers using a data produced global error criteria using for example, the percentile function. The use of Dynamic Outlier Bias Reduction is not limited to the reduction of bias in model predicted values, and its use in this embodiment is illustrative and exemplary only. Dynamic Outlier Bias Reduction may also be used, for example, to remove outliers from any statistical data set, including use in calculation of, but not limited to, arithmetic averages, linear regressions, and trend lines. The outlier facilities are still ranked from the calculation results, but the outliers are not used in the filtered data set applied to compute model coefficients or statistical results.
[0079] A standard procedure, commonly used to remove outliers, is to compute the standard deviation () of the data set and simply define all data outside a 2 interval of the mean, for example, as outliers. This procedure has statistical assumptions that, in general, cannot be tested in practice. The Dynamic Outlier Bias Reduction method description applied in an embodiment of this invention, is outlined in
Relative Error.sub.m=((Predicted Value.sub.mActual Value.sub.m)/Actual Value.sub.m).sup.2(1)
Absolute Error.sub.m=(Predicted Value.sub.mActual Value.sub.m).sup.2(2)
[0080] In Step 110, the analyst specifies the error threshold criteria that will define outliers to be removed from the calculations. For example using the percentile operation as the error function, a percentile value of 80 percent for relative and absolute errors could be set. This means that data values less than the 80th percentile value for a relative error and less than the 80th percentile value for absolute error calculation will be included and the remaining values are removed or considered as outliers. In this example, for a data value to avoid being removed, the data value must be less than both the relative and absolute error 80th percentile values. However, the percentile thresholds for relative and absolute error may be varied independently, and, in another embodiment, only one of the percentile thresholds may be used.
[0081] In Step 120, the model standard error and coefficient of determination (r.sup.2) percent change criteria are specified. While the values of these statistics will vary from model to model, the percent change in the preceding iteration procedure can be preset, for example, at 5 percent. These values can be used to terminate the iteration procedure. Another termination criteria could be the simple iteration count.
[0082] In Step 130, the optimization calculation is performed, which produces the model coefficients and predicted values for each facility.
[0083] In Step 140, the relative and absolute errors for all facilities are computed using Eqns. (1) and (2).
[0084] In Step 150, the error function with the threshold criteria specified in Step 110 is applied to the data computed in Step 140 to determine outlier threshold values.
[0085] In Step 160, the data is filtered to include only facilities where the relative error, absolute error, or both errors, depending on the chosen configuration, are less than the error threshold values computed in Step 150.
[0086] In Step 170, the optimization calculation is performed using only the outlier removed data set.
[0087] In Step 180, the percent change of the standard error and r.sup.2 are compared with the criteria specified in Step 120. If the percent change is greater than the criteria, the process is repeated by returning to Step 140. Otherwise, the iteration procedure is terminated in step 190 and the resultant model computed from this Dynamic Outlier Bias Reduction criteria procedure is completed. The model results are applied to all facilities regardless of their current iterative past removed or admitted data status.
[0088] In another embodiment, the process begins with the selection of certain iterative parameters, specifically: [0089] (1) an absolute error and relative error percentile value wherein one, the other or both may be used in the iterative process, [0090] (2) a coefficient of determination (also known as r.sup.2) improvement value, and [0091] (3) a standard error improvement value.
[0092] The process begins with an original data set, a set of actual data, and either at least one coefficient or a factor used to calculate predicted values based on the original data set. A coefficient or set of coefficients will be applied to the original data set to create a set of predicted values. The set of coefficients may include, but is not limited to, scalars, exponents, parameters, and periodic functions. The set of predicted data is then compared to the set of actual data. A standard error and a coefficient of determination are calculated based on the differences between the predicted and actual data. The absolute and relative error associated with each one of the data points is used to remove data outliers based on the user-selected absolute and relative error percentile values. Ranking the data is not necessary, as all data falling outside the range associated with the percentile values for absolute and/or relative error are removed from the original data set. The use of absolute and relative errors to filter data is illustrative and for exemplary purposes only, as the method may be performed with only absolute or relative error or with another function.
[0093] The data associated with the absolute and relative error within a user-selected percentile range is the outlier removed data set, and each iteration of the process will have its own filtered data set. This first outlier removed data set is used to determine predicted values that will be compared with actual values. At least one coefficient is determined by optimizing the errors, and then the coefficient is used to generate predicted values based on the first outlier removed data set. The outlier bias reduced coefficients serve as the mechanism by which knowledge is passed from one iteration to the next.
[0094] After the first outlier removed data set is created, the standard error and coefficient of determination are calculated and compared with the standard error and coefficient of determination of the original data set. If the difference in standard error and the difference in coefficient of determination are both below their respective improvement values, then the process stops. However, if at least one of the improvement criteria is not met, then the process continues with another iteration. The use of standard error and coefficient of determination as checks for the iterative process is illustrative and exemplary only, as the check can be performed using only the standard error or only the coefficient of determination, a different statistical check, or some other performance termination criteria (such as number of iterations).
[0095] Assuming that the first iteration fails to meet the improvement criteria, the second iteration begins by applying the first outlier bias reduced data coefficients to the original data to determine a new set of predicted values. The original data is then processed again, establishing absolute and relative error for the data points as well as the standard error and coefficient of determination values for the original data set while using the first outlier removed data set coefficients. The data is then filtered to form a second outlier removed data set and to determine coefficients based on the second outlier removed data set.
[0096] The second outlier removed data set, however, is not necessarily a subset of the first outlier removed data set and it is associated with second set of outlier bias reduced model coefficients, a second standard error, and a second coefficient of determination. Once those values are determined, the second standard error will be compared with the first standard error and the second coefficient of determination will be compared against the first coefficient of determination.
[0097] If the improvement value (for standard error and coefficient of determination) exceeds the difference in these parameters, then the process will end. If not, then another iteration will begin by processing the original data yet again; this time using the second outlier bias reduced coefficients to process the original data set and generate a new set of predicted values. Filtering based on the user-selected percentile value for absolute and relative error will create a third outlier removed data set that will be optimized to determine a set of third outlier bias reduced coefficients. The process will continue until the error improvement or other termination criteria are met (such as a convergence criteria or a specified number of iterations).
[0098] The output of this process will be a set of coefficients or model parameters, wherein a coefficient or model parameter is a mathematical value (or set of values), such as, but not limited to, a model predicted value for comparing data, slope and intercept values of a linear equation, exponents, or the coefficients of a polynomial. The output of Dynamic Outlier Bias Reduction will not be an output value of its own right, but rather the coefficients that will modify data to determine an output value.
[0099] In another embodiment, illustrated in
[0100] In Step 210 the initial data is listed in any order.
[0101] Step 220 constitutes the function or operation that is performed on the dataset. In this embodiment example, the function and operation is the ascending ranking of the data followed by successive arithmetic average calculations where each line corresponds to the average of all data at and above the line.
[0102] Step 230 computes the relative and absolute errors from the data using successive values from the results of Step 220.
[0103] Step 240 allows the analyst to enter the desired outlier removal error criteria (%). The Quality Criteria Value is the resultant value from the error calculations in Step 230 based on the data in Step 220.
[0104] Step 250 shows the data quality outlier filtered dataset. Specific values are removed if the relative and absolute errors exceed the specified error criteria given in Step 240.
[0105] Step 260 shows the arithmetic average calculation comparison between the complete and outlier removed datasets. The analyst is the final step as in all applied mathematical or statistical calculations judging if the identified outlier removed data elements are actually poor quality or not. The Dynamic Outlier Bias Reduction system and method eliminates the analyst from directly removing data but best practice guidelines suggest the analyst review and check the results for practical relevance.
[0106] In another embodiment illustrated in
[0107] In Step 310, the paired data is listed in any order.
[0108] Step 320 computes the relative and absolute errors for each ordered pair in the dataset.
[0109] Step 330 allows the analyst to enter the desired data validation criteria. In the example, both 90% relative and absolute error thresholds are selected. The Quality Criteria Value entries in Step 330 are the resultant absolute and relative error percentile values for the data shown in Step 320.
[0110] Step 340 shows the outlier removal process where data that may be invalid is removed from the dataset using the criteria that the relative and absolute error values both exceed the values corresponding to the user selected percentile values entered in Step 330. In practice other error criteria may be used and when multiple criteria are applied as shown in this example, any combination of error values may be applied to determine the outlier removal rules.
[0111] Step 350 computes the data validated and original data values statistical results. In this case, the Pearson Correlation Coefficient. These results are then reviewed for practical relevance by the analyst.
[0112] In another embodiment, Dynamic Outlier Bias Reduction is used to perform a validation of an entire data set. Standard error improvement value, coefficient of determination improvement value, and absolute and relative error thresholds are selected, and then the data set is filtered according to the error criteria. Even if the original data set is of high quality, there will still be some data that will have error values that fall outside the absolute and relative error thresholds. Therefore, it is important to determine if any removal of data is necessary. If the outlier removed data set passes the standard error improvement and coefficient of determination improvement criteria after the first iteration, then the original data set has been validated, since the filtered data set produced a standard error and coefficient of determination that too small to be considered significant (e.g. below the selected improvement values).
[0113] In another embodiment, Dynamic Outlier Bias Reduction is used to provide insight into how the iterations of data outlier removal are influencing the calculation. Graphs or data tables are provided to allow the user to observe the progression in the data outlier removal calculations as each iteration is performed. This stepwise approach enables analysts to observe unique properties of the calculation that can add value and knowledge to the result. For example, the speed and nature of convergence can indicate the influence of Dynamic Outlier Bias Reduction on computing representative factors for a multi-dimensional data set.
[0114] As an illustration, consider a linear regression calculation over a poor quality data set of 87 records. The form of the equation being regressed is y=mx+b. Table 1 shows the results of the iterative process for 5 iterations. Notice that using relative and absolute error criteria of 95%, convergence is achieved in 3 iterations. Changes in the regression coefficients can be observed and the Dynamic Outlier Bias Reduction method reduced the calculation data set based on 79 records. The relatively low coefficient of determination (r.sup.2=39%) suggests that a lower (<95%) criteria should be tested to study the additional outlier removal effects on the r.sup.2 statistic and on the computed regression coefficients.
TABLE-US-00001 TABLE 1 Dynamic Outlier Bias Reduction Example: Linear Regression at 95% Iteration N Error r.sup.2 m b 0 87 3.903 25% 0.428 41.743 1 78 3.048 38% 0.452 43.386 2 83 3.040 39% 0.463 44.181 3 79 3.030 39% 0.455 43.630 4 83 3.040 39% 0.463 44.181 5 79 3.030 39% 0.455 43.630
[0115] In Table 2 the results of applying Dynamic Outlier Bias Reduction are shown using the relative and absolute error criteria of 80%. Notice that a 15 percentage point (95% to 80%) change in outlier error criteria produced 35 percentage point (39% to 74%) increase in r.sup.2 with a 35% additional decrease in admitted data (79 to 51 records included). The analyst can use a graphical view of the changes in the regression lines with the outlier removed data and the numerical results of Tables 1 and 2 in the analysis process to communicate the outlier removed results to a wider audience and to provide more insights regarding the effects of data variability on the analysis results.
TABLE-US-00002 TABLE 2 Dynamic Outlier Bias Reduction Example: Linear Regression at 80% Iteration N Error r.sup.2 m b 0 87 3.903 25% 0.428 41.743 1 49 1.607 73% 0.540 51.081 2 64 1.776 68% 0.561 52.361 3 51 1.588 74% 0.558 52.514 4 63 1.789 68% 0.559 52.208 5 51 1.588 74% 0.558 52.514
[0116] As illustrated in
[0117] As illustrated in
[0118] As illustrated in
[0119] As
[0120] As illustrated in
[0121] In addition to the above-described quantitative analysis facilitated by the illustrative graph of
[0122]
[0123]
[0124]
[0125] Various embodiments include a system for reducing outlier bias in target variables measured for a facility.
[0126] In one embodiment, the system 1010 initiates the sensor 1022 which in turn detect and quantifies a given compound, e.g. carbon dioxide. The detection and quantification can be done continuously or within discrete time steps. Each time a measurement is completed, a data set is generated, is stored on the storage subsystem 1016, and inputted into the computing unit 1012. The data set is processed by the Dynamic Outlier Bias Removal computer program stored by the storage subsystem 1016 whereby it is censored according to the various embodiments of the methods disclosed herein. Once the computer program has processed the data, the processed data is outputted by the output unit 1024. In an embodiment wherein the output unit 1024 is a monitor or a printer, the results may be visualized in a diagram. In an embodiment wherein the output unit 1024 comprises a transmission device, the processed data is sent to a central database or a control center where the data can be further processed (not shown). Accordingly, the system according to the various disclosed embodiments provides a powerful tool to compare different facilities within one company or within one technical field with each other in an automated way wherein outlier bias is reduced.
[0127] In a preferred embodiment the measuring device 1020 comprises one or more sensors for detecting and quantifying a chemical compound. Due to the global warming, greenhouse gasses emitted by a facility are becoming an increasingly important target variable. Facilities that emit small amounts of greenhouse gasses may be better ranked than those emitting higher amounts although the overall productivity of the latter may be better. Examples of greenhouse gases are carbon dioxide (CO2), ozone (O3), water vapor (H2O), hydrofluorocarbons (HFCs), perfluorocarbons (PFCs), chlorofluorocarbons (CFCs), sulphur hexafluoride (SF6), methane (CH4), nitrous oxide (N2O), carbon monoxide (CO), nitrogen oxides (NOx), and non-methane volatile organic compounds (NMVOCs). The automated detection and quantification of these compounds may be used to develop industrial standards regarding certain allowable emissions of the greenhouse gasses. However, applying the Dynamic Outlier Bias Removal leads to removing outliers that may be caused by extraordinary circumstances in the production such as operating errors or even accidents. Thus, using various embodiments disclosed herein results in developing more accurate and meaningful standards. Once the industrial standards are developed, the system can be used to compare the emissions with the standards.
[0128] One of ordinary skill in the art would further appreciate that the scope of the present invention includes application of the various disclosed embodiments for reducing outlier bias in target variables relating to financial instruments, such as equity securities (e.g., common stock) or derivative contracts (e.g., forwards, futures, options, and swaps, etc.). For example, in one embodiment, the system 1010 comprises an input unit 1018 that receives data relating to a financial instrument, such as a common stock, and provides a corresponding data set. The target variable can be the stock price. Further, variables that relate to the target variable can be determined using various known methods of evaluating financial instruments, such as, for example, discounted cash flow analysis. Such related variables may include the relevant dividends, earnings, or cash flows, earnings per share, price-to-earnings ratio, or growth rate, etc. Once the database of target values and related variable values is formed, various embodiments of the Dynamic Outlier Bias Removal disclosed herein can be applied to the database, resulting in a more accurate model to evaluate the financial instrument.
[0129] The foregoing disclosure and description of the preferred embodiments of the invention are illustrative and explanatory thereof and it will be understood by those skilled in the art that various changes in the details of the illustrated system and method may be made without departing from the scope of the invention.