METHOD FOR MULTI-TASK-BASED PREDICTING MASSIVEUSER LOADS BASED ON MULTI-CHANNEL CONVOLUTIONAL NEURAL NETWORK
20230095676 · 2023-03-30
Inventors
- Haixiang ZANG (Nanjing, CN)
- Ruiqi XU (Nanjing, CN)
- Fengchun ZHANG (Nanjing, CN)
- Xin JIANG (Nanjing, CN)
- Zhinong WEI (Nanjing, CN)
- Guoqiang SUN (Nanjing, CN)
Cpc classification
International classification
Abstract
A method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network, and belongs to the technical field of electric power systems. The method includes clustering all residential users into a plurality of clusters with different daily average electricity consumption modes by adopting an agglomerative hierarchical clustering method. Corresponding input data sets are constructed for various clusters by adopting a multi-channel-based multi-source input fusion method. Then, a multi-task-based load prediction model based on a convolutional neural network is established for each of the clusters. Load prediction values for different users in the corresponding cluster are output in parallel by each model to eventually obtain load prediction results of all of the residential users. In the present disclosure, the load predictions for all of the residential users are completed, the average prediction accuracy is improved, the number of modeling times and the accumulative operation time are greatly reduced.
Claims
1-6. (canceled)
7. A method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network, wherein the method comprises following steps: (1) clustering, by adopting an agglomerative hierarchical clustering method, all residential users into a plurality of clusters with different daily average electricity consumption modes; wherein, the agglomerative hierarchical clustering method is as follows: 2.1 constructing a matrix F including clustering features of N samples:
F=[f.sub.1,f.sub.2, . . . , f.sub.N].sup.T, where f.sub.j is a clustering feature of a j-th sample, j represents a serial number of the j-th sample, T represents transposition, and j=1,2, . . . ,N; 2.2 calculating, by taking each sample as a cluster, proximities between each two clusters to obtain an initial proximity matrix P, wherein a calculation formula of an element p.sub.k,g in a k-th row and a g-th column is:
P={p.sub.k,g}k=1, . . . N, g=1, . . . N, and
p.sub.k,g=dis (f.sub.k,f.sub.g)k≠g, where dis (⋅) represents a calculation rule for a proximity of two clusters; both k and g represent serial numbers of the two clusters, and f.sub.k and f.sub.g are clustering features of k-th and g-th clusters, respectively; 2.3 merging two clusters with a highest proximity as a new cluster, and updating the proximity matrix P; and 2.4 repeating Step 2.3 until a total number of the clusters is 1 or a stopping condition is reached; (2) constructing, by adopting a multi-channel-based multi-source input fusion method, corresponding input data sets for the clusters; wherein the multi-channel-based multi-source input fusion method includes: 3.1 reconstructing a single-user time sequence input, reconstructing a historical load sequence of a residential user over a week from a day 8 days before a time to be predicted to a previous day of the time to be predicted into a two-dimensional feature map with 7 rows and 24 columns, wherein each row corresponds to daily loads on different dates and each column corresponds to loads of different hours on different dates; and 3.2 transmitting two-dimensional feature maps corresponding to different users in a same cluster to different channels in an input of the convolutional neural network, wherein data in a single channel is a two-dimensional feature map of one user, fusing, by utilizing the channel dimension, the feature maps of the different users in the same cluster to obtain a fused feature map, and taking the fused feature map as an input of a feature sharing layer in the convolutional neural network; and (3) establishing a multi-task-based load prediction model based on the convolutional neural network for each of the clusters, outputting, by each model, in parallel load prediction values for different users in a corresponding cluster to eventually obtain load prediction results of all of the residential users; wherein the multi-task-based load prediction model based on the convolutional neural network is: 4.1 taking load predictions of different residential users in a same cluster as different tasks, and implementing a multi-task-based learning strategy for each cluster in assistance with learning correlations and differences among loads of the different residential users; 4.2 taking the convolutional neural network as the feature sharing layer for a multi-task-based learning to extract the correlations among different tasks; and 4.3 taking the convolutional neural network as the feature sharing layer to learn shared information representations among the different users, wherein a model for multi-task-based predicting the massive-user loads based on the multi-channel convolutional neural network mainly includes the feature sharing layer and specific task layers, a bottom portion of the feature sharing layer is formed by alternately connecting two convolutional layers with two pooling layers, and then inputting a flattened result to a fully connected layer in a top portion to extract a shared feature, and transmitting the shared feature to each of the specific task layers, wherein the specific task layers are configured to extract unique features of each of the users, and are specifically composed of a feature extraction enhancement channel, a Concatenate layer and the fully connected layer, the feature extraction enhancement channel is composed of a single fully connected layer and configured to extract features from a historical load time sequence of each of the users, to input the extracted features and the shared feature into the Concatenate layer for fusion, and the load prediction values of all of the users in the same cluster are output in parallel after processing by the fully connected layer.
8. The method of claim 7, wherein, in Step 4.1, a calculation process of a loss function in the multi-task-based learning strategy is specifically as follows: assuming that multi-task-based learning includes V tasks in total, input and output data sets corresponding to each task being {x.sub.v, y.sub.v}, v=1, 2,...V, and then all of the input data sets being:
X={x.sub.1, . . . ,x.sub.v, . . . x.sub.V}, defining an output of a prediction model corresponding to the v-th task is defined as:
y.sup.v=u.sup.v(X;θ.sup.sha,θ.sup.v), where u.sup.v represents a mapping function of the prediction model corresponding to the v-th task, θ.sup.sha is a parameter for the feature sharing layer, and Ov is a parameter for a v-th specific task layer, v=1,2, . . . V; and conducting, by a plurality of tasks in a hard sharing mechanism, a joint learning on related tasks, and training network parameters by minimizing an overall loss function, wherein a calculation of an overall optimization loss function is as follows:
9. The method of claim 8, wherein, in Step 4.2, a calculation process of the convolutional neural network is specifically as follows: 4.2.1 conducting calculations in the convolutional layers, assuming that a number of convolution kernels in an a-th convolutional layer being C.sup.a, then a set MAP.sup.a of output feature maps in the layer being:
F.sub.down(map.sub.e.sup.a)=max {pix.sub.e,1.sup.a,pix.sub.e,2.sup.a, . . . ,pix.sub.e,n.sub.
e=1,2, . . . ,C.sup.a, where F.sub.down represents a downsampling function in the maximum pooling layer, C.sup.a represents a number of channels, map.sub.e.sup.a represents the output feature map in the a-th convolutional layer corresponding to the e-th convolution kernel, that is, a feature map in an input of an (a+1)-th pooling layer corresponding to an e-th channel, pix.sub.e,z.sup.a is a z-th pixel in the feature map, z=1,2, . . . , n.sub.e.sup.a+1 is a total number of pixels corresponding to the feature map, pool.sub.e.sup.a+1 represents an output feature map in the (a+1)-th pooling layer corresponding to the e-th channel, β.sub.e.sup.a+1 and b.sub.e.sup.a+1 are a multiplicative bias and an additive bias in the output feature map, and f.sub.con(⋅) is an activation function in the pooling layer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0029]
[0030]
[0031]
[0032]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0033] The present disclosure will be further clarified below in combination with specific embodiments, and it should be understood that these embodiments are only used to illustrate the present disclosure and not to limit the scope of the present disclosure. After reading the present disclosure, modifications of various equivalent forms in the present disclosure by those skilled in the art all fall within the scope defined by the appended claims of the present disclosure.
[0034] The present disclosure provides a method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network. As illustrated in
[0038] The specific implementation processes in predicting the loads of massive residential users by using the method of the present disclosure will be described in detail below with reference to the specific embodiments. Taking the residential user data obtained from 805 residential users in total in the user behavior test for intelligently metering the electric power, which is initiated by the Irish Energy Code Commission, as an example, in which each user includes the historical load data sampled every half an hour from Jul. 14, 2009 to Dec. 31, 2010, the load values at each of the o'clock time points are taken as points per hour to form the load data. The resident loads in 24 hours are predicted in advance. The test sets include the data of the last weeks per month, the rest of which are taken as the training sets.
[0039] In Step (1), all residential users are clustered into a plurality of clusters with different daily average electricity consumption modes by adopting an agglomerative hierarchical clustering method. The agglomerative hierarchical clustering method is as follows.
[0040] 2.1 The m-dimensional daily average load vector f.sub.j is taken as the clustering feature of each of the resident users, and a matrix F including clustering features of N samples is constructed as follows:
where N is the total number of resident users, that is, the number of samples; f.sub.j is a daily average load vector of the j-th user, that is, the clustering feature of each sample; D is the total number of days of the load data; m is the dimension number of the daily average load vector, which is determined by the resolution of the load data, where m is taken as 24 in the present disclosure; o.sub.j.sup.h represents the value for the h-th dimension of the daily average load vector corresponding to the j-th user; l.sub.j.sup.d,h represents a historical load value for the j-th user at the h-th hour on the d-th day; and T represents the transposition.
[0041] 2.2 Proximities between each two clusters are calculated by taking each sample (that is, a user) as a cluster respectively to obtain an initial proximity matrix P. The Euclidean distance is taken as the proximity calculation rule in the present disclosure, wherein a calculation formula of an element p.sub.k,g in the k-th row and the g-th column is:
where dis(⋅) represents a calculation rule for the proximity of the two clusters; both k and g represent serial numbers of the clusters, and fk and f.sub.g are clustering features of the k-th and g-th clusters, respectively.
[0042] 2.3 Two clusters with the highest proximity are merged, that is, the two clusters with the closest distance therebetween are merged in the present disclosure as a new cluster, and the proximity matrix P is updated. In the present disclosure, a sum of squares of deviations (Ward) method is adopted to calculate the proximities among clusters. The increment Δ ESS of the sum of squared deviation caused by the current mergence of each two clusters C.sub.i and C.sub.j is calculated, and only the two clusters corresponding to the smallest increment of the sum of squared deviation are merged into a new cluster.
[0043] Taking the sum of squared deviation of the cluster C.sub.i as an example, the calculation formula thereof is as follows:
where μ.sub.i represents the center of the cluster C.sub.i; and Q.sub.i represents the number of users included in the cluster C.sub.i.
[0044] The formula for calculating the increment of the sum of squared deviations caused by merging clusters C.sub.i and C.sub.j is as follows:
ΔESS=ESS(C.sub.i∪C.sub.j,μ.sub.i∪j)-ESS(C.sub.i,μ.sub.i)-ESS(C.sub.j,μ.sub.j),
where μ.sub.i, μ.sub.j and μ.sub.i∪j represent the centers of cluster C.sub.i, cluster C.sub.j and new cluster C.sub.i∪C.sub.j, respectively.
[0045] 2.4 Step 2.3 is repeated until the total number of clusters is 1 or a stopping condition is reached. In the present disclosure, the 805 resident users are clustered into 22 classes.
[0046] In Step (2), corresponding input data sets are constructed for various clusters by adopting a multi-channel-based multi-source input fusion method, as illustrated in
[0047] 3.1 A single-user time sequence input is reconstructed. A historical load sequence of a residential user over the week from the day 8 days before the time to be predicted to the previous day at the time to be predicted is reconstructed into a two-dimensional feature map with 7 rows and 24 columns, wherein each row corresponds to daily loads for different dates and each column corresponds to loads of a specific hour on different dates.
[0048] 3.2 Two-dimensional feature maps of different users in the same cluster are fused by utilizing a channel dimension. The two-dimensional feature maps corresponding to the different users in the same cluster are transmitted to different channels in inputs of the convolutional neural network, wherein data on a single channel is a two-dimensional feature map of one user. The feature maps of the different users in the same cluster are fused by utilizing the channel dimension, and the fused feature map is taken as an input of a feature sharing layer in the convolutional neural network.
[0049] In Step (3), a multi-task-based load prediction model based on the convolutional neural network is established for each of the clusters, as illustrated in
[0050] 4.1 Load predictions for the different residential users in the same cluster are taken as different tasks, and a multi-task-based learning strategy is implemented for each cluster in assistance with learning correlations and differences among the loads of the different residential users. A calculation process of a loss function in the multi-task-based learning strategy is specifically as follows.
[0051] It is assumed that multi-task-based learning includes V tasks in total, an input and output data set corresponding to each task is {x.sub.v, y.sub.v}, v=1, 2, . . . V, and then all of the input data sets are:
X={x.sub.1, . . . ,x.sub.v, . . . x.sub.V}
[0052] An output of the prediction model corresponding to the v-th task is defined as:
y.sup.v=u.sup.v(X;θ.sup.sha,θ.sup.v),
where u.sup.v represents a mapping function of the prediction model corresponding to the v-th task, θ.sup.sha is a parameter for the feature sharing layer, and θ.sup.v is a parameter for the v-th specific task layer, v=1,2, . . . V.
[0053] A joint learning is conducted on related tasks for a plurality of tasks in a hard sharing mechanism, and network parameters are trained by minimizing an overall loss function, wherein a calculation of an overall optimization loss function is as follows:
where loss(⋅) represents a loss function for the tasks; and α.sub.v is a weight coefficient corresponding to each of the tasks.
[0054] 4.2 The convolutional neural network is taken as a feature sharing layer for a multi-task-based learning to extract correlations among different tasks. A calculation process of the convolutional neural network is specifically as follows.
[0055] 4.2.1 Calculations in the convolutional layers are conducted. It is assumed that the number of convolution kernels in the a-th convolutional layer is C.sup.a, and then a set MAP.sup.a of output feature maps in the layer is:
where map.sub.e.sup.a represents an output feature map corresponding to the e-th convolution kernel in the a-th convolutional layer, map.sub.r.sup.a−1 represents the r-th output feature map in the (a−1)-th layer, C.sup.a−1 is the number of the output feature maps in the (a−1)-th layer, that is, the number of channels included in input data of the a-th convolutional layer, w.sub.re.sup.a is a kernel parameter for the e-th convolution kernel in the a-th convolutional layer corresponding to the r-th output feature map in a previous layer of the a-th convolutional layer, b.sub.e.sup.a is a bias in the a-th convolutional layer corresponding to the e-th output feature map, and f.sub.con(⋅) represents an activation function in the convolutional neural network.
[0056] 4.2.2 A calculation in a maximum pooling layer is conducted, which is specifically as follows:
F.sub.down(map.sub.e.sup.a)=max {pix.sub.e,1.sup.a,pix.sub.e,2.sup.a, . . . ,pix.sub.e,n.sub.
e=1,2, . . . ,C.sup.a,
where F.sub.down represents a downsampling function in the maximum pooling layer, C.sup.a represents the number of channels, map.sub.e.sup.a represents the output feature map in the a-th convolutional layer corresponding to the e-th convolution kernel, that is, a feature map in an input of the (a+1)-th pooling layer corresponding to the e-th channel, pix.sub.e,z.sup.a is the z-th pixel in the feature map, z=1,2, . . . , n.sub.e.sup.a, n.sub.e.sup.a+1 is the total number of pixels corresponding to the feature map, pool.sub.e.sup.a+1 represents an output feature map in the (a+1)-th pooling layer corresponding to the e-th channel, β.sub.e.sup.a+1 and b.sub.e.sup.a+1 are a multiplicative bias and an additive bias in the output feature map, and f.sub.con(⋅) is an activation function in the pooling layer.
[0057] 4.3 The convolutional neural network is taken as the feature sharing layer to learn shared information representations among the different users, wherein a model for multi-task-based predicting the massive-user loads based on the multi-channel convolutional neural network mainly includes the feature sharing layer and specific task layers. The bottom portion of the feature sharing layer is formed by alternately connecting two convolutional layers with two pooling layers, and then a flattened result is input into a fully connected layer in the top portion to extract shared features, and transmit the shared features to each of the specific task layers. The specific task layers are configured to extract unique features of each of the users, each of which is specifically formed by a feature extraction enhancement channel, a Concatenate layer and the fully connected layer. The feature extraction enhancement channel is formed by a single fully connected layer, which is configured to extract features from a historical load time sequence of each of the users, to input the extracted features and shared features into the Concatenate layer for fusion. The load prediction values of all of the users in the same cluster are output in parallel after processing by the fully connected layer. The prediction results of the method provided in the present disclosure are as shown in Table 1.
[0058] Since the true load values of 0 of the residential users exist in the data set, two error indicators, RMSE and MAE, are selected to measure the average prediction accuracy based on this method for the massive users. The values of the above two error indicators are averaged for all residential users in the present disclosure, and the calculation formulas are as follows:
where U is the total number of the residential users; N.sub.i is the number of samples included in the i-th user; ŷ.sub.j.sup.i is the load prediction value for the j-th sample of the i-th user; and y.sub.j.sup.i is the true load value for the j-th sample of the i-th user.
[0059] In addition, three other multi-task-based prediction methods are selected as the benchmark methods in the present disclosure. In Method One, the multi-channel multi-source input fusion method in the provided method is replaced with the traditional multi-task-based learning input construction method. In Method Two, the feature extraction enhancement channels at the output terminals in the provided method are removed. In Method Three, the multi-source input fusion method in the provided method is replaced with the traditional multi-task-based learning input construction method, while the feature extraction enhancement channels at the output terminals in the provided method are removed. In such a way, the effectiveness of the multi-channel-based multi-source input fusion method and the feature extraction enhancement channel in improving the average prediction accuracy is verified by users respectively. In addition, three single-task prediction methods based on DNN, CNN, and LSTM, respectively, namely, Method Four, Method Five, and Method Six, are selected as benchmark methods to highlight the advantages of the multi-task-based learning in term of average prediction accuracy and overall operation efficiency. The load prediction results of the six benchmark prediction methods are as shown in Table 1.
TABLE-US-00001 TABLE 1 Comparison of the prediction results between the provided method and six benchmark prediction methods Cumulative Cumulative Total Method Prediction RMS Emean MAEmean training time testing time Duration Types Methods (Kwh) (Kwh) (s) (s) (s) Mulit-task The provided 0.1787 0.1327 2388 38 2426 Method Method One 0.1880 0.1410 2104 40 2144 Method Two 0.1848 0.1378 1306 24 1330 Method Three 0.2112 0.1670 2147 39 2186 Single-task Method Four 0.1833 0.1368 3496 67 3563 Method Five 0.1882 0.1411 6260 113 6373 Method Six 0.1918 0.1463 402493 12002 414495
[0060] It can be seen from Table 1 that the cumulative time spent on the four multi-task-based load prediction methods to complete the load prediction tasks of all users is significantly less than that of the three single-task-based benchmark prediction methods, which reflects the significant advantage of the multi-task-based learning strategy in improving the overall operation efficiency. The average values for the error indicators of Benchmark Method Three is larger than those of the single-task learning prediction method based on DNN, which indicates that the multi-task-based learning benchmark methods using the traditional multi-task-based learning input construction method and output structure cannot improve the average prediction accuracy in the scenarios of increasing massive users. If the input thereof is improved (compared with Method Two) by the multi-channel-based multi-source input fusion method and the output thereof is improved (compared with method One) by adding the feature extraction enhancement channel, the average prediction accuracy can be improved. If the provided solutions are adopted, that is, the input and output of traditional multi-task-based learning are improved at the same time, the prediction accuracy can reach the highest effectiveness, the cumulative operation time is approximate to that in Method Three, and the total duration is proper among the six benchmark prediction methods. The comparison chart of the prediction curves of the solutions provided in the present disclosure and the single-task prediction method based on DNN is as illustrated in
[0061] To sum up, the solutions provided in the present disclosure can be applied to scenarios of predicting massive-user loads to handle the user-level load prediction tasks on large scales. Compared with the single-task-based load prediction methods, the time resources are significantly reduced by the solutions provided in the present disclosure. Compared with the load prediction methods based on the traditional multi-task-based learning structure, the average prediction accuracy is improved, thereby realizing the balance between prediction accuracy and operation efficiency, and obtaining a stronger engineering application value and potential, which can provide electric power enterprises a reference basis to provide personalized value-added services for electricity sales, play an important guiding role in improving the marketing levels of electric power enterprises and avoiding assessment deviations, and which can provide an effective reference for the formulation of demand response plans, and facilitates the economic operation of the electric power grid.