CHARACTERIZATION METHOD BASED ON DEEP REINFORCEMENT LEARNING FOR DISCRETE MANUFACTURING INDUSTRY DATA
20240210924 ยท 2024-06-27
Assignee
Inventors
- Haigen YANG (Jiangsu, CN)
- Cong Wang (Jiangsu, CN)
- Mei WANG (Anhui, CN)
- Luyang LI (Anhui, CN)
- Donghuang LIN (Jiangsu, CN)
- Jixin LIU (Jiangsu, CN)
- Fanyu ZENG (Jiangsu, CN)
- Yan GE (Jiangsu, CN)
Cpc classification
International classification
Abstract
Disclosed is a characterization method based on deep reinforcement learning for discrete manufacturing industry data. The method includes: collecting discrete manufacturing industry data, and creating a spatio-temporal database; dividing the discrete manufacturing industry data into a discrete feature and a continuous feature, creating a data coupling coding network, converting a coding vector in the coding network into a characterization vector, and creating a data characterization model; quantitatively characterizing discrimination of a data category by means of cluster evaluation indexes; and using weights of cluster evaluation indexes of different dimensions as dynamic rewards, creating a deep reinforcement learning model, and updating a neural network parameter of deep reinforcement learning through characterization of an interactive relation between a model and a discrete manufacturing decision-making analysis system.
Claims
1. A characterization method based on deep reinforcement learning for discrete manufacturing industry data, comprising following steps: (1) collecting discrete manufacturing industry data, and creating a spatio-temporal database; (2) dividing the discrete manufacturing industry data into a discrete feature and a continuous feature, creating a data coupling coding network, converting a coding vector in the data coding network into a characterization vector, and creating a data characterization model; (3) quantitatively characterizing a discrimination of a data category by means of cluster evaluation indexes; and (4) using weights of the cluster evaluation indexes of different dimensions as dynamic rewards, creating a deep reinforcement learning model, and updating a neural network parameter of deep reinforcement learning through characterization of an interactive relation between a model and a discrete manufacturing decision-making analysis system.
2. The characterization method according to claim 1, wherein the discrete manufacturing industry data in step (1) comprises real-time workshop device data, advanced planning and scheduling (APS) production scheduling data, product data management (PDM) product data, enterprise resource planning (ERP) purchase-sale-stock data, and manufacturing execution system (MES) production execution data.
3. The characterization method according to claim 1, wherein creating the data coupling coding network in step (2) comprises: creating a correlation matrix r(ax, vi) between the discrete feature and the continuous feature as follows:
4. The characterization method according to claim 3, wherein converting the coding vector in the data coding network into the characterization vector in step (2) comprises: converting the coding vector ? into the characterization vector with a fully-connected network as follows:
5. The characterization method according to claim 1, wherein the deep reinforcement learning model in step (4) is a deep Q-network (DQN), and a Q-router is characterized as:
6. The characterization method according to claim 5, wherein the reward information of the deep reinforcement learning in step (4) is a dynamic reward as follows:
7. The characterization method according to claim 6, wherein the cluster evaluation indexes of the different dimensions comprise a Calinski-Harabasz (CH) index, a Davies-Bouldin index (DBI), and/or a silhouette coefficient.
8. The characterization method according to claim 1, wherein the deep reinforcement learning model in step (4) further comprises one of deep deterministic policy gradient (DDPG), Advanced-Actor-Critic (A2C)/Asynchronous-Advanced-Actor-Critic (A3C), proximal policy optimization (PPO)/trust region policy optimization (TRPO), soft actor critic (SAC), and twin delayed deep deterministic policy gradient (TD3).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0028]
[0029]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0030] A technical solution of the present disclosure will be further described below with reference to the accompanying drawings.
[0031] As shown in
[0032] (1) Discrete manufacturing industry data is collected, and a spatio-temporal database is created.
[0033] The discrete manufacturing industry data collected includes real-time workshop device data, advanced planning and scheduling (APS) production scheduling data, product data management (PDM) product data, enterprise resource planning (ERP) purchase-sale-stock data, and manufacturing execution system (MES) production execution data.
[0034] (2) The discrete manufacturing industry data is divided into a discrete feature and a continuous feature, a data coupling coding network is created, a coding vector in the data coding network is converted into a characterization vector, and a data characterization model is created. The step specifically includes the following steps:
[0035] (2.1) A correlation matrix r(?.sub.i.sup.x, ?.sub.j) between the discrete feature and the continuous feature is created as follows:
[0036] In the matrix, ?.sub.i.sup.x denotes the continuous feature; ?.sub.j denotes the discrete feature; ? denotes a proportional coefficient; ? denotes a threshold parameter; ?(?.sub.i.sup.x, ?.sub.j) denotes a joint probability density; and a computation function expression of the joint probability density is as follows:
[0037] In the formula, N denotes the number of data objects, L.sub.?(?.sub.j.sup.k, ?.sub.j) denotes a kernel function between discrete feature values ?.sub.j.sup.k and ?.sub.j,
denotes a kernel function of the continuous feature, ?.sub.i.sup.k denotes a continuous feature value ?.sub.i of a variable A.sub.i on a kth data object, ?.sub.i.sup.x denotes a continuous feature value ?.sub.i of a variable ?.sub.i on an xth data object, and r.sub.i denotes a bandwidth parameter of the continuous feature. An expression of the kernel function L.sub.?(?.sub.j.sup.k, ?.sub.j) is as follows:
[0038] In the formula, ?.sub.j.sup.k denotes a feature value corresponding to the discrete feature ?.sub.j on the kth data object, and ? denotes a proportional coefficient.
[0039] (2.2) The correlation matrix is used as a data coupling coding vector as follows:
[0040] In the formula, a coupling coding matrix M.sub.x denotes a heterogeneous coupling relation between the discrete feature and the continuous feature, and the coupling coding matrix M.sub.x is quantitatively converted into a coding vector ?.
[0041] (2.3) The coding vector is converted into the characterization vector with a fully-connected network as follows:
[0042] In the formula, ? denotes a logistic function,
and W?R includes interaction strengths between all features.
[0043] (3) Cluster evaluation indexes of different dimensions are selected to quantitatively characterize discrimination of a data category according to needs of a specific scene, where the cluster evaluation indexes include a Calinski-Harabasz (CH) index, a Davies-Bouldin index (DBI), and/or a silhouette coefficient.
[0044] The CH index is as follows:
[0045] In the formula, c.sub.i denotes an i category, ?.sub.i denotes the number of data objects in c.sub.i, and d(x,y) denotes a distance between data objects x and y.
[0046] The DBI is as follows:
[0047] In the formula,
[0048] The silhouette coefficient is as follows:
[0049] In the formula, i and j denote sample points in different categories, and ?(i) denotes cohesion of the sample point, that is, similarity between the sample point and other points in the same cluster, which is computed as follows:
[0050] In the formula, distance denotes a distance between i and j; and b(i) denotes similarity between the sample point and other points in a next nearest cluster, which is computed in a similar way to ?(i).
[0051] (4) According to needs of different scenes, the cluster evaluation indexes of different dimensions are adjusted according to weight coefficients and weighted as dynamic rewards, a deep reinforcement learning model is created, and a neural network parameter of deep reinforcement learning is updated through characterization of an interactive relation between a model and a discrete manufacturing decision-making analysis system.
[0052] A deep Q-network (DQN) is used as the deep reinforcement learning model, and a Q-router is characterized as:
[0053] In the formula, Q(s, ?) denotes a Q value of node s for executing action ?. Q denotes creation of a Q routing table, s denotes a model node, ? denotes a state action, ? denotes a learning rate, R denotes reward information, Q(s, ?) denotes an updated Q value, and Q(s, ?) denotes a value before updating. The dynamic reward information R is as follows:
[0054] In the formula, ?.sub.i denotes a parameter, and r.sub.i denotes the cluster evaluation indexes of different dimensions. If the dynamic reward is maximized, the characterized data is used in the discrete manufacturing decision-making analysis system. Otherwise, step (2) is returned, and a data characterization dimension is optimized to the greatest extent by continuously feeding back the dynamic reward information, such that an optimal data characterization form is obtained. The deep reinforcement learning model in step (4) of the present disclosure is not limited to DQN, and may further be a model of deterministic policy gradient (DDPG), Advanced-Actor-Critic (A2C)/Asynchronous-Advanced-Actor-Critic (A3C), proximal policy optimization (PPO)/trust region policy optimization (TRPO), soft actor critic (SAC), and twin delayed deep deterministic policy gradient (TD3).