CHANCE CONSTRAINED EXTREME LEARNING MACHINE METHOD FOR NONPARAMETRIC INTERVAL FORECASTING OF WIND POWER

20220209532 · 2022-06-30

    Inventors

    Cpc classification

    International classification

    Abstract

    The present application discloses a chance constrained extreme learning machine method for nonparametric interval forecasting of wind power, which belongs to the field of renewable energy generation forecasting. The method combines an extreme learning machine with a chance constrained optimization model, ensures that the interval coverage probability is no less than the confidence level by chance constraint, and takes minimizing the interval width as the training objective. The method avoids relying on the probability distribution hypothesis or limiting the interval boundary quantile level, so as to directly construct prediction intervals with well reliability and sharpness. The present application also proposes a bisection search algorithm based on difference of convex functions optimization to achieve efficient training for the chance constrained extreme learning machine.

    Claims

    1. A chance constrained extreme learning machine method for nonparametric interval forecasting of wind power, comprising the following steps of: 1) constructing a chance constrained extreme learning machine model; 2) constructing a sample average approximation model of the chance constrained extreme learning machine; 3) constructing a parametric 0-1 loss minimization model ; 4) constructing a parametric difference of convex functions optimization model; 5) adopting the bisection search algorithm based on difference of convex functions optimization to train the extreme learning machine; wherein the step 1) comprises: comprehensively considering the joint probability distribution of a wind power and an input feature thereof, limiting the wind power to fall into prediction intervals with a probability not lower than a nominal confidence level by using chance constraint, and taking an expectation of minimizing a interval width as a training objective, and constructing the chance constrained extreme learning machine model: min ω , ω u �� μ [ f ( x , ω u ) - f ( x , ω ) ] which is subject to: μ [ f ( x , ω ) y f ( x , ω u ) ] 100 ( 1 - β ) % 0 f ( x , ω ) f ( x , ω u ) 1 where x is a random variable corresponding to the input feature, y is a random variable corresponding to normalized wind power, a joint probability distribution of the two is denoted as μ(x, y); f(x,custom-character) and f(x, ω.sub.u) are output equations of the extreme learning machine, which represent upper and lower boundaries of the prediction interval, respectively, custom-character and ω.sub.u are weight vectors from a hidden layer of the extreme learning machine to an output neuron; 100(1−β)% is the nominal confidence level of the prediction interval; custom-character and custom-character denote expectation and probability operators, respectively; the step 2) comprises: replacing the joint probability distribution of the input feature and wind power with an empirical probability distribution of training set samples thereof, approximating an actual expectation in an objective function by empirical expectation, and approximating an actual probability in the chance constraint by empirical probability to obtain the sample average approximate model of the chance constrained extreme learning machine; v = min γ t , ω , ω u .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] which is constrained by: γ t = max { f ( x t , ω ) - y t , y t - f ( x t , ω u ) } , t �� .Math. t �� [ 1 - �� ( γ t 0 ) ] β .Math. �� .Math. 0 f ( x t , ω ) f ( x t , ω u ) 1 , t �� where v* is an optimal value of the optimization model, which denotes the shortest overall width of prediction intervals satisfying the chance constraint; x.sub.t and y.sub.t are an input feature and the wind power; custom-character is a subscript set of various samples of the training set {(x.sub.t,y.sub.t)custom-character, |custom-character| is a number of the samples of the training set, β|custom-character| is a maximum number of wind power samples outside the prediction interval at the nominal confidence level; γ.sub.t is an auxiliary variable indicating whether the wind power falls into the prediction interval, with a non-negative value indicating that the wind power falls into the corresponding prediction interval or a positive value indicating that the wind power does not fall into the corresponding prediction interval; max{⋅} is a maximum function, taking a maximum value of each variable thereof; custom-character(⋅) is an indicator function of a logical discriminant, with a value being 1 when the logical discriminant is true and 0 when the logical discriminant is false; the step 3) comprises: introducing a virtual parametric variable to represent an overall width budget of the prediction interval, minimizing the probability that the wind power does not fall into the prediction interval on a premise of meeting the width budget to obtain the parametric minimized 0-1 loss minimization model; ρ ( v ) = min γ t , ω , ω u .Math. t �� [ 1 - �� ( γ t 0 ) ] which is subject to: γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t �� where v is an introduced parametric variable representing the overall width budget of the prediction interval; ρ(v) is an optimal value function of the minimized 0-1 loss model with respect to the parametric variable v, and a smallest parametric variable that satisfies the condition ρ(v)≤β|custom-character| is a shortest overall width v* of the prediction interval that satisfies the chance constraint; the step 4) comprises: approximating the indicator function in the objective function of the minimized 0-1 loss model with a parametric variable by a difference of convex functions to obtain the difference of convex functions optimization model with a parametric variable: ρ _ ( v ) = min γ t , ω , ω u .Math. t �� [ 1 - L DC ( γ t ; m ) ] which is constrained by: L DC ( γ t ; m ) = max { - m γ t , 0 } - max { - m γ t - 1 , 0 } , t �� γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t �� where ρ(v) is an optimal value function of the difference of convex functions optimization model with the parameter v; L.sub.DC(γ.sub.t; m) is a difference of convex functions approximating the indicator function custom-character(γ.sub.t≤0); m is a positive slope parameter of the difference of convex functions, m is a positive value, and the greater the value, the higher the similarity between the difference of convex functions L.sub.DC(γ.sub.t; m) and the indicator function custom-character(γ.sub.t≤0); the decision vector of the model is denoted as θ=[γ.sup.τcustom-characterω.sub.u.sup.τ].sup.τ, where γ=[γ.sub.1 γ.sub.2 ⋅ ⋅ ⋅ custom-character].sup.τ; the step 5) comprises: using the bisection search algorithm based on difference of convex functions optimization to search for the shortest overall width v* of the prediction intervals that satisfy the chance constraint, so as to realize the training of the extreme learning machine; and further comprising the following steps: step (1), giving a bisection search algorithm precision ∈.sub.1 and a bisection search interval [custom-character, v.sub.u], wherein the given bisection search interval should contain the shortest overall width v* of the prediction intervals; Step (2): for the parametric difference of convex functions optimization model, setting the parameter v thereof as a midpoint (custom-character+v.sub.u)/2 of the bisection search interval, and solving the difference of convex functions optimization model by using a convex-concave procedure algorithm, with a solution being θ=[γ.sup.τcustom-characterω.sub.u.sup.τ].sup.τ; step (3): calculating the number miss of samples of which the wind power in the training set falls outside the prediction interval: miss .Math. t �� [ 1 - �� ( f ( x t , ω _ ) y t f ( x t , ω _ u ) ) ] step (4), if miss≥β|custom-character|, updating the upper boundary v.sub.u←(custom-character+v.sub.u)/2 of the bisection search interval and recording weight vectors custom-charactercustom-character, ω.sub.u←ω.sub.u of an output layer of the current extreme learning machine; otherwise, updating the lower boundary custom-character←(custom-character+v.sub.u)/2 of the bisection search interval; step (5): if v.sub.u−custom-character≤∈.sub.1, outputting the weight vectors custom-character and ω.sub.u of the output layer of the extreme learning machine, otherwise, returning to step (2).

    2. The chance constrained extreme learning machine method for nonparametric interval forecasting of wind power according to claim 1, wherein a convex-concave procedure algorithm is adopted to solve the difference of convex functions optimization model with a given parameter, which comprises the following steps: step (1): giving an algorithm convergence accuracy ∈.sub.2, the slope parameter m of the difference of convex functions, and the parameter v representing the overall width budget of the prediction intervals; step (2): setting an iteration counter k←0; solving the following linear programming problem to obtain an initial solution θ.sup.(0) of the model: θ ( 0 ) arg min γ t , ω , ω u 1 T γ which is subject to: γ t 0 , t T γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t �� where 1 is a vector with all elements being 1, whose dimension is the same as the number of the samples in the training set; step (3): updating the solution of the parametric difference of convex functions optimization model in a (k+1).sup.th iteration by using the following formula: θ ( k + 1 ) arg min γ t , ω , ω u L vex + ( γ ) - [ L vex - ( γ ( k ) ) + δ ( k ) T ( γ - γ ( k ) ) ] L vex + ( γ ) = .Math. t �� [ 1 + max { - m γ t - 1 , 0 } ] L vex - ( γ ) = .Math. t �� max { - m γ t - 1 , 0 } γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t �� which is subject to: where L.sub.vex.sup.+(γ) and L.sub.vex.sup.−(γ) are both convex functions, which constitute a minuend and a subtrahend of the difference of convex functions L.sub.DC(γ.sub.t; m), respectively; δ.sup.(k) is a subdifferential of the convex function L.sub.vex.sup.−(γ) at γ.sup.(k), satisfying δ ( k ) { g .Math. �� .Math. .Math. L vex - ( γ ) L vex - ( γ ( k ) ) + g T ( γ - γ ( k ) ) , γ } = { [ g 1 g 2 .Math. g .Math. �� .Math. ] T .Math. t �� , { g t = - m if γ t 0 g t [ - m , 0 ] if γ t = 0 g t = 0 if γ t > 0 } where g ∈custom-character is a real column vector with a dimensionality equal to the number of samples |custom-character| in the training set; step (4): the iteration counter self-increasing k←k+1; calculating a convergence error e←0.sup.(k)−θ.sup.(k−1) ; and step (5), checking whether a Euclidean norm ∥e∥.sub.2 of the convergence error meets the convergence accuracy ∈.sub.2, and if not, returning to the step (3), otherwise outputting a converged solution θ←θ.sup.(k).

    Description

    BRIEF DESCRIPTION OF DRAWINGS

    [0043] FIG. 1 is a flowchart of the chance constrained extreme learning machine based nonparametric interval forecast of the present application;

    [0044] FIG. 2 displays the wind power prediction intervals of a summer data set obtained by the method of the present application.

    DESCRIPTION OF EMBODIMENTS

    [0045] The present application will be further explained with reference to the drawings and examples.

    [0046] (1) a training data set custom-character={(x.sub.t, y.sub.t)custom-character and a test data set custom-character={(x.sub.t, y.sub.t)}.sub.t∈v are constructed, wherein x.sub.t is an input feature, y.sub.t is a wind power value to be predicted, and custom-character and V subscript sets of the samples in the training set and test set respectively; the number of the hidden-layer neurons of the extreme learning machine is determined; the weight vector of the input layer and the bias of the hidden layer of extreme learning machine are randomly initialized; the nominal confidence level of the prediction interval 100(1−β)% is determined.

    [0047] (2) A sample average approximation model of the chance constrained extreme learning machine is constructed

    [00015] min γ t , ω , ω u .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ]

    [0048] which is subject to:

    [00016] γ t = max { f ( x t , ω ) - y t , y t - f ( x t , ω u ) } , t �� .Math. t �� [ 1 - �� ( γ t 0 ) ] β .Math. �� .Math. 0 f ( x t , ω ) f ( x t , ω u ) 1 , t ��

    [0049] where f(x,custom-character) and f(x, ω.sub.u) are output equations of the extreme learning machine, which represent upper and lower boundaries of the prediction interval respectively, custom-character and ω.sub.u are weight vectors from the hidden layer of the extreme learning machine to an output neuron; γ.sub.t is the auxiliary variable indicating whether the wind power falls into the prediction interval, a non-negative value indicating that the wind power falls into the corresponding prediction interval, and a positive value indicating that the wind power does not fall into the corresponding prediction interval; max{⋅} is the maximum function, taking the maximum value of each variable thereof; custom-character(⋅) is the indicator function of a logical discriminant, taking 1 when the logical discriminant is true and 0 when the logical discriminant is false.

    [0050] (3) The following parametric difference of convex functions optimization model is constructed:

    [00017] ρ _ ( v ) = min γ t , ω , ω u Σ t �� [ 1 - L DC ( γ t ; m ) ]

    [0051] which is subject to:

    [00018] L D C ( γ t ; m ) = max { - m γ t , 0 } - max { - m γ t - 1 , 0 } , t �� γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t ��

    [0052] where v is the introduced parameter representing the overall width budget of the prediction intervals; ρ(v) is an optimal value function of the difference convex optimization model with respect to the parametric variable v; L.sub.DC(γ.sub.t; m) is a difference of convex functions that can be decomposed into the difference of two convex functions, and m is the slope parameter of the difference of convex functions; f(x,custom-character) and f(x, ω.sub.u) are the output equations of the extreme learning machine; the decision vector of the model is denoted as θ=[γ.sup.τcustom-characterω.sub.u.sup.τ].sup.τ, where γ=[γ.sub.1 γ.sub.2 ⋅ ⋅ ⋅ custom-character].sup.τ.

    [0053] (4) For the parameter v in the difference convex optimization model, the bisection search algorithm is used to search for the shortest overall width v* of the prediction intervals that satisfy the chance constraint, so as to realize the training of the extreme learning machine; the algorithm specifically includes the following steps:

    [0054] step (1), giving a bisection search algorithm precision ∈.sub.1 and a bisection search interval [custom-character], wherein the given bisection search interval should contain the shortest overall width v* of the prediction intervals;

    [0055] Step (2): for the parametric difference of convex functions optimization, setting the parameter v thereof as the midpoint (custom-character+v.sub.u)/2 of the bisection search interval, and solving the difference of convex functions optimization model by using a convex-concave procedure algorithm as described in steps (2.1)-(2.5):

    [0056] step (2.1): giving the algorithm convergence accuracy ∈.sub.2, the slope parameter m of the difference of convex functions, and the parameter v representing the overall width budget of the prediction intervals;

    [0057] step (2.2): setting the iteration counter k←0; solving the following linear programming problem to obtain an initial solution θ.sup.(0) of the model:

    [00019] θ ( 0 ) argmin γ t , ω , ω u 1 T γ

    [0058] which is subject to:

    [00020] γ t 0 , t �� γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t ��

    [0059] where 1 is a vector with all elements being 1, whose dimension is the same as the number of the samples in the training set;

    [0060] step (2.3): updating the solution of the parametric difference of convex functions optimization model in the (k+1).sup.th iteration by the following formula:

    [00021] θ ( k + 1 ) argmin γ t , ω , ω u L vex + ( γ ) - [ L v e x - ( γ ( k ) ) + δ ( k ) T ( γ - γ ( k ) ) ]

    [0061] which is subject to:

    [00022] L v e x + ( γ ) = .Math. t �� [ 1 + max { - m γ t - 1 , 0 } ] L vex - ( γ ) = .Math. t �� max { - m γ t - 1 , 0 } γ t f ( x t , ω ) - y t , t �� γ t y t - f ( x t , ω u ) , t �� .Math. t �� [ f ( x t , ω u ) - f ( x t , ω ) ] v 0 f ( x t , ω ) f ( x t , ω u ) 1 , t ��

    [0062] where L.sub.vex.sup.+(γ) and L.sub.vex.sup.−(γ) are both convex functions, which constitute a minuend and a subtrahend of the difference of convex functions L.sub.DC(γ.sub.t; m) respectively; δ.sup.(k) is a subdifferential of the convex function L.sub.vex.sup.−(γ) at γ.sup.(k), satisfying

    [00023] δ ( k ) { g .Math. �� .Math. .Math. L vex - ( γ ) L vex - ( γ ( k ) ) + g T ( γ - γ ( k ) ) , γ } = { [ g 1 g 2 .Math. g .Math. �� .Math. ] T .Math. t �� , { g t = - m if γ t 0 g t [ - m , 0 ] if γ t = 0 g t = 0 if γ t > 0 }

    [0063] where g ∈custom-character is a real column vector with dimensionality equal to the number of samples |custom-character| in the training set;

    [0064] step (2.4): the iteration counter self-increasing k←k+1; calculating a convergence error e←θ.sup.(k)−θ.sup.(k−1); and

    [0065] step (2.5), checking whether a Euclidean norm ∥e∥.sub.2 of the convergence error meets the convergence accuracy ∈.sub.2, and if not, returning to the step (3), otherwise outputting the converged solution θ←θ.sup.(k).

    [0066] step (3): calculating the number miss of samples of which the wind power in the training set falls outside the prediction interval

    [00024] miss Σ t �� [ 1 - �� ( f ( x t , ω _ ) y t f ( x t , ω _ u ) ) ]

    [0067] step (4), if miss≤β|custom-character|, updating the upper boundary v.sub.u←(custom-character+v.sub.u)/2 of the bisection search interval and recording weight vectors custom-charactercustom-character, ω.sub.u←ω.sub.u of an output layer of the current extreme learning machine; otherwise, updating the lower boundary custom-character←(custom-character+v.sub.u)/2 of the bisection search interval;

    [0068] step (5): if v.sub.u−custom-character≤∈.sub.1, outputting the weight vectors custom-character and ω.sub.u of the output layer of the extreme learning machine, otherwise, returning to step (2).

    [0069] (5) The trained extreme learning machine is used to construct the prediction intervals {[f (x.sub.t, custom-character), f(x.sub.t, ω.sub.u])}.sub.t∈v of the test set custom-character={(x.sub.t, y.sub.t)}.sub.t∈v, and the average coverage deviation (ACD) is used to evaluate the reliability of the prediction intervals, which is defined as the deviation between the empirical coverage probability (ECP) and the nominal confidence level 100(1−β)%:

    [00025] ACD := ECP - ( 1 - β ) = 1 .Math. �� .Math. Σ t �� �� ( f ( x t , ω ) y t f ( x t , ω u ) ) - ( 1 - β )

    [0070] where, |V| is the number of the samples in the test set, the smaller the absolute value of the average coverage deviation, the better the reliability of the prediction interval;

    [0071] The average width (AW) of the interval is used to evaluate the sharpness of the prediction interval, which is defined as

    [00026] AW := 1 .Math. �� .Math. Σ t �� ( f ( x t , ω u ) - f ( x t , ω ) )

    [0072] On the premise of well reliability of the prediction intervals, the smaller the average width of the prediction intervals, the higher the sharpness performance of the prediction intervals.

    [0073] The above process is shown in FIG. 1.

    [0074] The effectiveness of the proposed method is verified by the actual wind power data from the Glens of Foudland Wind Farm in Scotland in 2017, and the time resolution of the data is 30 minutes. Considering the differences of seasonal characteristics, the wind power prediction model for each season is independently trained and verified, in which the training samples account for 60% of the data set in each season, and the remaining 40% samples are used as the test set. The leading time for prediction is 1 hour, and the nominal confidence of the prediction intervals is 95%.

    [0075] Table 1 shows the performance indices of the prediction interval obtained by using sparse Bayesian learning, Gaussian kernel density estimation and the method of the present application. It can be seen that the absolute value of the average coverage deviation of the present application is less than 1.4%, and the empirical coverage probability is close to the nominal confidence level of 95%, which has excellent reliability; the average coverage deviation of sparse Bayesian learning in winter, summer and autumn data sets exceeds −2.6%, which is difficult to ensure the reliability of prediction; although Gaussian kernel density estimation has well reliability in data sets except winter, the prediction intervals obtained in winter, spring, summer and autumn are respectively 16.5%, 28.4%, 34.3% and 16.3% wider than those obtained by the method of the present application. To sum up, the method of the present application can effectively shorten the interval width on the premise of satisfactory reliability of the prediction interval.

    TABLE-US-00001 TABLE 1 Performance comparison of prediction intervals obtained by different forecasting methods Empirical coverage Average coverage Average width Season Method probability deviation of interval Winter Sparse Bayesian 90.92% −4.08% 0.2971 learning Gaussian kernel density 92.05% −2.95% 0.3808 estimation The method of the 93.66% −1.34% 0.3268 present application Spring Sparse Bayesian 93.58% −1.42% 0.2714 learning Gaussian kernel density 93.39% −1.61% 0.3110 estimation The method of the 94.06% −0.94% 0.2423 present application Summer Sparse Bayesian 91.53% −3.47% 0.2128 learning Gaussian kernel density 95.76% 0.76% 0.2621 estimation The method of the 94.42% −0.58% 0.1952 present application Autumn Sparse Bayesian 92.38% −2.62% 0.3572 learning Gaussian kernel density 96.37% 1.37% 0.4294 estimation The method of the 95.01% 0.01% 0.3693 present application

    [0076] FIG. 2 shows the prediction intervals obtained by the method in summer data set and the corresponding real wind power. It can be seen that the prediction intervals obtained by the proposed method can well track the ramp events of wind power, and the width of the prediction intervals can be adaptively adjusted according to the input features, and thus the method has excellent performance. It should be noted that this method is also applicable to interval prediction of power generation and load of other renewable energy sources except wind power, and thus has wide applicability.

    [0077] The above description of the specific embodiments of the present application is not intended to limit the scope of protection of the present application. All equivalent models or equivalent algorithm flowcharts made according to the content of the description and drawings of the present application, which are directly or indirectly applied to other related technical fields, all fall within the scope of patent protection of the present application.