LSTM-based hot-rolling roll-bending force predicting method
12423557 ยท 2025-09-23
Assignee
Inventors
- Xu Li (Shenyang, CN)
- Feng Luan (Shenyang, CN)
- Lin Wang (Shenyang, CN)
- Yan Wu (Shenyang, CN)
- Yuejiao Han (Shenyang, CN)
- Dianhua Zhang (Shenyang, CN)
Cpc classification
Y02P90/30
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
B21B2269/02
PERFORMING OPERATIONS; TRANSPORTING
G06F17/16
PHYSICS
G06N3/0442
PHYSICS
International classification
G06F17/16
PHYSICS
Abstract
Provided is an LSTM-based hot-rolling roll-bending force predicting method including the steps of acquiring final rolling data of a stand of a stainless steel rolling mill when performing a hot rolling process, and dividing the data into a training set traindata and a test set testdata; normalizing the traindata; building a matrix P; using a last row of the matrix P as a label of the training set, namely a true value; calculating and updating an output value and the true value of a network; after network training is completed, taking the last m output data of the LSTM network as an input at a next moment, and then obtaining an output of the network at the next moment, wherein the output is a predicted value of the roll-bending force at the next moment; repeating the steps until a sufficient number of prediction data is obtained; and comparing the processed data with the true value in the testdata to check the validity of the network.
Claims
1. A Long Short Term Memory (LSTM)-based hot-rolling roll-bending force predicting method, comprising the following steps: 1) Acquiring final rolling data of a stand of a stainless steel rolling mill when performing a hot rolling process, and collecting roll-bending force data for experiment; 2) Dividing the roll-bending force data into two parts: a training set traindata and a test set testdata according to a specified ratio in time sequence; 3) Normalizing the traindata to obtain a normalized vector A; 4) building a matrix P by using the vector A in step 3); 5) taking first m rows of the matrix P in step 4) as an input and sending the first m rows to an LSTM network; 6) using a last row of the matrix P as a label, namely a true value, of the training set, performing calculation on an output value and the true value of the LSTM network by using a formula to obtain an error, and updating a weight and a bias of the LSTM network by a gradient descent method; 7) after the LSTM network training is completed, taking last m output data of the LSTM network as an input at a next moment, and then obtaining the output of the LSTM network at the next moment, wherein the output is a predicted value of a roll-bending force at the next moment; 8) repeating step 7) until a number of prediction data is obtained; 9) performing an inverse normalization on the obtained predicted value of the roll-bending force, and comparing the inverse normalized predicted value of the roll-bending force with the true value in the testdata to check the validity of the LSTM network; and 10) controlling the roll-bending force of the stainless steel rolling mill based on the comparing in step 9) at a next hot rolling process.
2. The LSTM-based hot-rolling roll-bending force predicting method of claim 1, wherein in step 4), a matrix
3. The LSTM-based hot-rolling roll-bending force predicting method of claim 1, wherein in step 5), the LSTM network adopts a traditional LSTM network, or adopts an ON-LSTM network or a double-layer ON-LSTM network, and the double-layer ON-LSTM network is adopted as: taking the first m rows of the matrix P in step 4) as an input of a first-layer LSTM, and sending obtained output data as an input to a second-layer LSTM, wherein output data of the second-layer LSTM is an output of the whole LSTM network.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
(20) The invention will be further described below with reference to the accompanying drawings.
(21) As shown in
(22)
(23) In step 5), the LSTM network can adopt a traditional LSTM network (a hidden-layer structure is shown as
(24) When the ON-LSTM network is adopted, steps 1)-9) of the method adopting the ON-LSTM network are the same as those of the method adopting the traditional LSTM network.
(25) When the double-layer ON-LSTM network is adopted, steps 1)-4) and steps 6)-9) of the method adopting the double-layer ON-LSTM network are the same as those of the method adopting the traditional LSTM network, but step 5) is implemented in the following way: the first m rows of the matrix Pin step 4) are used as an input of a first-layer LSTM, and the obtained output data as an input is sent to the second-layer LSTM. The output data of the second-layer LSTM is an output of the whole LSTM network.
(26) In step 5), the first-layer LSTM and the second-layer LSTM are ordered neuron LSTMs that introduce an update mechanism, namely ON-LSTM. The function of the update mechanism lies in that when the parameters of the LSTM are updated, each neuron is sorted in a certain order, and the importance level, that is, a hierarchical structure, is introduced. Higher-level information represents important information, which needs to be retained in the LSTM network. Conversely, lower-level information represents unimportant information. For example, if the roll-bending force data at a certain moment is very different from that at the previous time moment and the subsequent time moment, thereby resulting in a big jump, it does not contribute much to the trend learning of the entire roll-bending force data, so that it is the unimportant information, and the other data are relatively the important information.
(27) The unimportant information needs to be updated with new input data. The detailed process and calculation formula are as follows:
(28) Assuming that a primary hierarchical position corresponding to the important information is represented by S1, a secondary hierarchical position corresponding to the unimportant information is represented by S2. Through x.sub.t and h.sub.t-1, S1 and S2 are calculated as:
S1=F.sub.1(x.sub.t,h.sub.t-1)=index max(soft max(W.sub.f.sub.
S2=F.sub.2(x.sub.t,h.sub.t-1)=index max(soft max(W.sub.i.sub.
wherein an indexmax function is used to find a position number (sorting from 1) corresponding to the largest element in the vector, x.sub.t is input data, h.sub.t-1 is recursive data, W and U are weight matrixes, and b is a bias.
(29) The network updates c.sub.t based on the hierarchical positions. Considering the relative magnitude between S1 and S2, there are two classes of update models: (1) when S2S1, the positions corresponding to the important information and the unimportant information partially overlap, and the calculation formula of a current cell state c.sub.t is:
(30)
(31) (2) When S2<S1, the positions corresponding to the important information and the unimportant information are independent of each other, and the current cell state c.sub.t is calculated by the following formula:
(32)
wherein k is the dimension of c.sub.t, f.sub.t and i.sub.t are forget gate output and input gate output, respectively, and {tilde over (c)}.sub.t is an intermediate unit state.
(33) For the convenience of describing the update process, {tilde over (f)}.sub.t and .sub.t are defined as a main forget gate and a main input gate, respectively, wherein w.sub.t1, w.sub.t2 and w.sub.t3 represent high, medium and low levels in the hierarchical structure, respectively.
{tilde over (f)}.sub.t=cumsum(soft max(W.sub.f.sub.
.sub.t=1cumsum(soft max(W.sub.i.sub.
W.sub.t2={tilde over (f)}.sub.t.sub.t
W.sub.t1={tilde over (f)}.sub.tW.sub.t2
W.sub.t3=.sub.tW.sub.t2
(34) The complete update formula of the ON-LSTM network is:
(35)
wherein the cumax function is an abbreviation for cumsum(softmax( )).
(36) The update calculation process of the LSTM network weights and biases mentioned in step 6) is as follows:
(37) The calculation formula of the weighted input error term of the traditional LSTM and the ON-LSTM networks is:
(38)
(39) The double-layer ON-LSTM with a double-layer structure is adopted, and two layers of the network are slightly different in the calculation when the weights and the biases are updated. The calculation formulas of the weighted input error terms of the first-layer LSTM and the second-layer LSTM respectively are:
(40) The first-layer LSTM:
(41)
(42) The second-layer LSTM:
(43)
(44) Wherein f.sub.t is a forget gate output, i.sub.t is an input gate output, o.sub.t is an output gate output, c.sub.t is a current cell state, c.sub.t-1 is a cell state at a previous time moment, {tilde over (c)}.sub.t is an intermediate unit state, x.sub.t is input data, h.sub.t is recursive data, h.sub.t-1 is recursive data at the previous time moment, {tilde over (f)}.sub.t is a main forget gate output, .sub.t is a main input gate output, w.sub.t1 is a high level, w.sub.t2 is a medium level, w.sub.t3 is a low level, W.sub.fx, W.sub.ix, W.sub.cx, W.sub.ox, W.sub.{tilde over (f)}x, W.sub.x, W.sub.fh, W.sub.ih, W.sub.ch, W.sub.oh, W.sub.{tilde over (f)}h and W.sub.h are weight matrices, b.sub.f, b.sub.i, b.sub.c, b.sub.o, b.sub. and b.sub.{tilde over (f)} are bias vectors, .sub.t is an error term at a time moment t, .sub.f,t, .sub.i,t, .sub.{tilde over (c)},t and .sub.o,t are error terms corresponding to four weighted inputs of f.sub.t, i.sub.t, c.sub.t, and o.sub.t, respectively.
(45) The calculation of a weight gradient:
(46)
(47) The calculation of a bias gradient:
(48)
(49) Wherein E is an error, h.sub.t-1.sup.T is a transpose of h.sub.t-1, x.sub.t.sup.T is a transpose of x.sub.t, and W.sub.oh,t is the W.sub.oh at the time moment t. Similarly, W.sub.fh,t, W.sub.ih,t, W.sub.ch,t, W.sub.ox,t, W.sub.fx,t, W.sub.ix,t, W.sub.cx,t, b.sub.o,t, b.sub.f,t, b.sub.i,t and b.sub.c,t are W.sub.fh, W.sub.ih, W.sub.ch, W.sub.ox, W.sub.fx, W.sub.ix, W.sub.cx, b.sub.o, b.sub.f, b.sub.i and b.sub.c at the time moment t.
(50) According to the continuity and time sequence characteristics of the rolling process, the invention selects the LSTM (Long Short-Term Memory) neural network model. The LSTM is a long-term and short-term memory network, which is a time recurrent neural network (RNN), and is mainly used to solve the problem of gradient disappearance and gradient explosion in the long sequence training process. In short, the LSTM can perform better in longer sequences than ordinary RNN.
(51) The invention selects the LSTM network to predict the roll-bending force. Different from the traditional neural network or other machine learning methods, the invention does not need to acquire a large number of input parameters that affect the roll-bending force in advance, and thus the preliminary data preparation work is simplified, and only the roll-bending force data of each moment acquired till the current moment are formed to be a time series as an input and sent to the network, so as to train the LSTM network and predict the roll-bending force at the subsequent time moments.
(52) The invention adopts three different LSTM network models to predict the roll-bending force. In addition to the traditional LSTM network, two improved LSTM networks are also proposed for experiments. The two improved LSTM networks are: (1) ON-LSTM network: The ON-LSTM network adds an update mechanism on the basis of the traditional LSTM network, and compared with the traditional LSTM, the ON-LSTM can effectively enhance the robustness of the network and improve the accuracy of the network; and (2) Double-layer ON-LSTM: The double-layer ON-LSTM network organically combines the update mechanism with the double-layer structure. Because the double-layer structure has strong function fitting ability, the double-layer ON-LSTM can further improve the prediction accuracy of the network on the basis of the ON-LSTM.
(53) In order to prove the effectiveness of the LSTM (traditional LSTM, ON-LSTM, double-layer ON-LSTM) network model provided by the invention, the roll-bending force data of a 1580 mm hot rolling process of a stainless steel rolling mill are acquired and divided into three different datasets, and each dataset contains 500 sample points. The traditional LSTM network, the ON-LSTM network and the double-layer ON-LSTM network are subjected to experiment on the above three datasets, respectively, and the artificial neural network (ANN) is used to do an experiment with the same dataset to compare the experiment results. The maximum and average errors of three different network models on the three datasets are shown as
(54) First 450 sample points are taken from three different datasets for network training, and the trained LSTM networks are used to predict next 50 sample points, namely the roll-bending force data at the next 50 sample points. The results are shown as
(55) Under the condition of keeping the used dataset, the number of training samples, the number of prediction samples and the like unchanged, the LSTM networks used in the invention are replaced with the ANN network for comparative experiments. The results are shown as