CHARACTERIZATION METHOD BASED ON DEEP REINFORCEMENT LEARNING FOR DISCRETE MANUFACTURING INDUSTRY DATA

Abstract

Disclosed is a characterization method based on deep reinforcement learning for discrete manufacturing industry data. The method includes: collecting discrete manufacturing industry data, and creating a spatio-temporal database; dividing the discrete manufacturing industry data into a discrete feature and a continuous feature, creating a data coupling coding network, converting a coding vector in the coding network into a characterization vector, and creating a data characterization model; quantitatively characterizing discrimination of a data category by means of cluster evaluation indexes; and using weights of cluster evaluation indexes of different dimensions as dynamic rewards, creating a deep reinforcement learning model, and updating a neural network parameter of deep reinforcement learning through characterization of an interactive relation between a model and a discrete manufacturing decision-making analysis system.

Claims

1. A characterization method based on deep reinforcement learning for discrete manufacturing industry data, comprising following steps: (1) collecting discrete manufacturing industry data, and creating a spatio-temporal database; (2) dividing the discrete manufacturing industry data into a discrete feature and a continuous feature, creating a data coupling coding network, converting a coding vector in the data coding network into a characterization vector, and creating a data characterization model; (3) quantitatively characterizing a discrimination of a data category by means of cluster evaluation indexes; and (4) using weights of the cluster evaluation indexes of different dimensions as dynamic rewards, creating a deep reinforcement learning model, and updating a neural network parameter of deep reinforcement learning through characterization of an interactive relation between a model and a discrete manufacturing decision-making analysis system.

2. The characterization method according to claim 1, wherein the discrete manufacturing industry data in step (1) comprises real-time workshop device data, advanced planning and scheduling (APS) production scheduling data, product data management (PDM) product data, enterprise resource planning (ERP) purchase-sale-stock data, and manufacturing execution system (MES) production execution data.

3. The characterization method according to claim 1, wherein creating the data coupling coding network in step (2) comprises: creating a correlation matrix r(ax, vi) between the discrete feature and the continuous feature as follows: $r (a_{i}^{x}, v_{j}) = {\begin{matrix} a_{i}^{x}, & if p (a_{i}^{x}, v_{j}) ? t \\ ? a_{i}^{x}, & in other cases \end{matrix}$ wherein, ?.sub.i.sup.x denotes the continuous feature; ?.sub.j denotes the discrete feature; ? denotes a proportional coefficient; ? denotes a threshold parameter; ?(?.sub.i.sup.x, ?.sub.j) denotes a joint probability density; and a computation function expression of the joint probability density is as follows: $p (a_{i}^{x}, v_{j}) = \frac{1}{N} {.Math.}_{k = 1}^{N} {L_{?} (v_{j}^{k}, v_{j}) W (\frac{a_{i}^{k} - a_{i}^{x}}{h_{i}})}$ in above formula, N denotes a number of data objects, L.sub.?(?.sub.j.sup.k, ?.sub.j) denotes a kernel function between discrete feature values ?.sub.j.sup.k and ?.sub.j, $W (\frac{a_{i}^{k} - a_{i}^{x}}{h_{i}})$ denotes a kernel function of the continuous feature, ?.sub.i.sup.k denotes a continuous feature value ?.sub.i of a variable A.sub.i on a kth data object, ?.sub.i.sup.k denotes the continuous feature value ?.sub.i of the variable A.sub.i on an xth data object, and r.sub.i denotes a bandwidth parameter of the continuous feature; and an expression of the kernel function L.sub.?(?.sub.j.sup.k, ?.sub.j) is as follows: $L_{?} (v_{j}^{k}, v_{j}) = {\begin{matrix} 1 & if v_{j}^{k} = v_{j} \\ ? & in other cases \end{matrix}$ in above formula, ?.sub.j.sup.k denotes a feature value corresponding to the discrete feature ?.sub.j on the kth data object, and ? denotes the proportional coefficient; and using the correlation matrix as a data coupling coding vector as follows: $M_{x} = .Math. \begin{matrix} r (a_{1}^{n}, v_{1}) & .Math. & r (a_{1}^{n}, v_{1}) \\ .Math. & ? & .Math. \\ r (a_{d_{n}}^{n}, v_{1}) & .Math. & r (a_{d_{n}}^{n}, v_{1}) \end{matrix} .Math.$ a coupling coding matrix M.sub.x denotes a heterogeneous coupling relation between the discrete feature and the continuous feature, and the coupling coding matrix M.sub.x is quantitatively converted into the coding vector ?.

4. The characterization method according to claim 3, wherein converting the coding vector in the data coding network into the characterization vector in step (2) comprises: converting the coding vector ? into the characterization vector with a fully-connected network as follows: $h = ? (f, W)$ in above formula, ? denotes a logistic function, $? (z) = \frac{1}{1 + e^{- z}},$ W denotes a weight matrix, W?R, and R denotes a real matrix, which comprises interaction strengths between all features.

5. The characterization method according to claim 1, wherein the deep reinforcement learning model in step (4) is a deep Q-network (DQN), and a Q-router is characterized as: $Q^{} (s, a) = Q (s, a) + ? {R - Q (s, a)}$ wherein, Q(s, ?) denotes a Q value of node s for executing an action ?, wherein Q denotes creation of a Q routing table, s denotes a model node, ? denotes a state action, ? denotes a learning rate, R denotes reward information, Q(s, ?) denotes an updated Q value, and Q(s, ?) denotes a Q value before updating.

6. The characterization method according to claim 5, wherein the reward information of the deep reinforcement learning in step (4) is a dynamic reward as follows: $R = {.Math.}_{i = 1}^{n} ?_{i} r_{i}$ wherein, r.sub.i denotes the cluster evaluation indexes of the different dimensions, ?.sub.i denotes a weight coefficient of the cluster evaluation indexes of the different dimensions, and R denotes the reward information.

7. The characterization method according to claim 6, wherein the cluster evaluation indexes of the different dimensions comprise a Calinski-Harabasz (CH) index, a Davies-Bouldin index (DBI), and/or a silhouette coefficient.

8. The characterization method according to claim 1, wherein the deep reinforcement learning model in step (4) further comprises one of deep deterministic policy gradient (DDPG), Advanced-Actor-Critic (A2C)/Asynchronous-Advanced-Actor-Critic (A3C), proximal policy optimization (PPO)/trust region policy optimization (TRPO), soft actor critic (SAC), and twin delayed deep deterministic policy gradient (TD3).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0028] FIG. 1 is a flow diagram of a characterization method of the present disclosure.

[0029] FIG. 2 is a structural diagram of deep reinforcement learning of the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0030] A technical solution of the present disclosure will be further described below with reference to the accompanying drawings.

[0031] As shown in FIG. 1, a characterization method based on deep reinforcement learning for discrete manufacturing industry data according to the present disclosure includes the following steps:

[0032] (1) Discrete manufacturing industry data is collected, and a spatio-temporal database is created.

[0033] The discrete manufacturing industry data collected includes real-time workshop device data, advanced planning and scheduling (APS) production scheduling data, product data management (PDM) product data, enterprise resource planning (ERP) purchase-sale-stock data, and manufacturing execution system (MES) production execution data.

[0034] (2) The discrete manufacturing industry data is divided into a discrete feature and a continuous feature, a data coupling coding network is created, a coding vector in the data coding network is converted into a characterization vector, and a data characterization model is created. The step specifically includes the following steps:

[0035] (2.1) A correlation matrix r(?.sub.i.sup.x, ?.sub.j) between the discrete feature and the continuous feature is created as follows:

[00010] $r (a_{i}^{x}, v_{j}) = {\begin{matrix} a_{i}^{x}, if p (a_{i}^{x}, v_{j}) ? t \\ ? a_{i}^{x}, in other cases \end{matrix}$

[0036] In the matrix, ?.sub.i.sup.x denotes the continuous feature; ?.sub.j denotes the discrete feature; ? denotes a proportional coefficient; ? denotes a threshold parameter; ?(?.sub.i.sup.x, ?.sub.j) denotes a joint probability density; and a computation function expression of the joint probability density is as follows:

[00011] $p (a_{i}^{x}, v_{j}) = \frac{1}{N} {.Math.}_{k = 1}^{N} {L_{?} (v_{j}^{k}, v_{j}) W (\frac{a_{i}^{k} - a_{i}^{x}}{h_{i}})}$

[0037] In the formula, N denotes the number of data objects, L.sub.?(?.sub.j.sup.k, ?.sub.j) denotes a kernel function between discrete feature values ?.sub.j.sup.k and ?.sub.j,

[00012] $W (\frac{a_{i}^{k} - a_{i}^{x}}{h_{i}})$

denotes a kernel function of the continuous feature, ?.sub.i.sup.k denotes a continuous feature value ?.sub.i of a variable A.sub.i on a kth data object, ?.sub.i.sup.x denotes a continuous feature value ?.sub.i of a variable ?.sub.i on an xth data object, and r.sub.i denotes a bandwidth parameter of the continuous feature. An expression of the kernel function L.sub.?(?.sub.j.sup.k, ?.sub.j) is as follows:

[00013] $L_{?} (v_{j}^{k}, v_{j}) = {\begin{matrix} 1 if v_{j}^{k} = v_{j} \\ ? in other cases \end{matrix}$

[0038] In the formula, ?.sub.j.sup.k denotes a feature value corresponding to the discrete feature ?.sub.j on the kth data object, and ? denotes a proportional coefficient.

[0039] (2.2) The correlation matrix is used as a data coupling coding vector as follows:

[00014] $M_{x} = .Math. \begin{matrix} r (a_{1}^{n}, v_{1}) & .Math. & r (a_{1}^{n}, v_{L}) \\ .Math. & ? & .Math. \\ r (a_{d_{n}}^{n}, v_{1}) & .Math. & r (a_{d_{n}}^{n}, v_{L}) \end{matrix} .Math.$

[0040] In the formula, a coupling coding matrix M.sub.x denotes a heterogeneous coupling relation between the discrete feature and the continuous feature, and the coupling coding matrix M.sub.x is quantitatively converted into a coding vector ?.

[0041] (2.3) The coding vector is converted into the characterization vector with a fully-connected network as follows:

[00015] $h = ? (f, W)$

[0042] In the formula, ? denotes a logistic function,

[00016] $? (z) = \frac{1}{1 + e^{- x}},$

and W?R includes interaction strengths between all features.

[0043] (3) Cluster evaluation indexes of different dimensions are selected to quantitatively characterize discrimination of a data category according to needs of a specific scene, where the cluster evaluation indexes include a Calinski-Harabasz (CH) index, a Davies-Bouldin index (DBI), and/or a silhouette coefficient.

[0044] The CH index is as follows:

[00017] $CH = \frac{{.Math.}_{i} n_{i} d^{2} (c_{i}, c) / (NC - 1)}{{.Math.}_{i} {.Math.}_{x ? C_{i}} d^{2} (x, c_{i}) / (NC - 1)}$

[0045] In the formula, c.sub.i denotes an i category, ?.sub.i denotes the number of data objects in c.sub.i, and d(x,y) denotes a distance between data objects x and y.

[0046] The DBI is as follows:

[00018] $DBI = \frac{1}{N} {.Math.}_{i = 1}^{N} \max_{j ? i} \frac{\overline{s_{i}} - \overline{s_{j}}}{{.Math. w_{i} - w_{j} .Math.}_{2}}$

[0047] In the formula, S.sub.i denotes an average Euclidean distance from the ith category of data to a center of the category, and ?w.sub.i?w.sub.j?.sub.2 denotes a Euclidean distance between the ith category and a center of a jth category.

[0048] The silhouette coefficient is as follows:

[00019] $S (i) = \frac{b (i) - a (i)}{\max {a (i), b (i)}}$

[0049] In the formula, i and j denote sample points in different categories, and ?(i) denotes cohesion of the sample point, that is, similarity between the sample point and other points in the same cluster, which is computed as follows:

[00020] $a (i) = \frac{1}{n - 1} {.Math.}_{j ? i}^{n} distance (i, j)$

[0050] In the formula, distance denotes a distance between i and j; and b(i) denotes similarity between the sample point and other points in a next nearest cluster, which is computed in a similar way to ?(i).

[0051] (4) According to needs of different scenes, the cluster evaluation indexes of different dimensions are adjusted according to weight coefficients and weighted as dynamic rewards, a deep reinforcement learning model is created, and a neural network parameter of deep reinforcement learning is updated through characterization of an interactive relation between a model and a discrete manufacturing decision-making analysis system.

[0052] A deep Q-network (DQN) is used as the deep reinforcement learning model, and a Q-router is characterized as:

[00021] $Q^{} (s, a) = Q (s, a) + ? {R - Q (s, a)}$

[0053] In the formula, Q(s, ?) denotes a Q value of node s for executing action ?. Q denotes creation of a Q routing table, s denotes a model node, ? denotes a state action, ? denotes a learning rate, R denotes reward information, Q(s, ?) denotes an updated Q value, and Q(s, ?) denotes a value before updating. The dynamic reward information R is as follows:

[00022] $R = {.Math.}_{i = 1}^{n} ?_{i} r_{i}$

[0054] In the formula, ?.sub.i denotes a parameter, and r.sub.i denotes the cluster evaluation indexes of different dimensions. If the dynamic reward is maximized, the characterized data is used in the discrete manufacturing decision-making analysis system. Otherwise, step (2) is returned, and a data characterization dimension is optimized to the greatest extent by continuously feeding back the dynamic reward information, such that an optimal data characterization form is obtained. The deep reinforcement learning model in step (4) of the present disclosure is not limited to DQN, and may further be a model of deterministic policy gradient (DDPG), Advanced-Actor-Critic (A2C)/Asynchronous-Advanced-Actor-Critic (A3C), proximal policy optimization (PPO)/trust region policy optimization (TRPO), soft actor critic (SAC), and twin delayed deep deterministic policy gradient (TD3).

CHARACTERIZATION METHOD BASED ON DEEP REINFORCEMENT LEARNING FOR DISCRETE MANUFACTURING INDUSTRY DATA

Assignee

Inventors

Cpc classification

Classification Explorer

G05B2219/33027

PHYSICS

Classification Explorer

G05B19/41865

PHYSICS

International classification

Classification Explorer

G05B19/418

PHYSICS

Abstract

Claims

Description