Dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions

Abstract

The present invention provides a dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions, including acquiring bearing vibration data under different working conditions to obtain a source domain sample and a target domain sample; establishing a deep convolutional neural network model with dynamic joint distribution alignment; feeding both the source domain sample and the target domain sample into the deep convolutional neural network model with initialized parameters, and extracting, by a feature extractor, high-level features of the source domain sample and the target domain sample; calculating a marginal distribution distance and a conditional distribution distance; obtaining a joint distribution distance according to the marginal distribution distance and the conditional distribution distance, and combining the joint distribution distance and a label loss to obtain a target function; and optimizing the target function by using SGD, and training the deep convolutional neural network model.

Claims

1. A dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions, comprising steps of: S1: acquiring bearing vibration data under different working conditions to obtain a source domain sample and a target domain sample, wherein a bearing has different health states under each working condition, bearing vibration data in different health states under each working condition is used as a data domain, and the source domain sample and the target domain sample are selected from the data domain; and the source domain sample is attached with a label, and the label corresponds to a fault type and a fault size; S2: establishing a deep convolutional neural network model with dynamic joint distribution alignment, wherein the deep convolutional neural network model comprises a feature extractor and a classifier; and modifying a last layer of neurons of the classifier, to enable a quantity of the last layer of neurons of the classifier to be equal to a quantity of types of labels; S3: feeding both the source domain sample and the target domain sample into the deep convolutional neural network model with initialized parameters, and extracting, by the feature extractor, high-level features of the source domain sample and the target domain sample; processing, by the classifier, the high-level features, to generate a predicted label of the source domain sample and a soft pseudo label of the target domain sample, and calculating a label loss between the predicted label of the source domain sample outputted by the classifier and an actual label of the source domain sample; and processing the high-level features of the source domain sample and the target domain sample by using maximum mean discrepancy (MMD) to obtain a marginal distribution distance between the source domain sample and the target domain sample, and processing the high-level features of the source domain sample and the target domain sample, the actual label of the source domain sample, and the soft pseudo label of the target domain sample by using weighted conditional MMD, to obtain a conditional distribution distance between the source domain sample and the target domain sample; S4: obtaining a joint distribution distance according to the marginal distribution distance and the conditional distribution distance, and combining the joint distribution distance and the label loss to obtain a target function; S5: optimizing the target function by using stochastic gradient descent (SGD), and training the deep convolutional neural network model to obtain an optimized deep convolutional neural network model; and S6: inputting the target domain sample into the optimized deep convolutional neural network model to obtain a predicted label of a target domain; and comparing the predicted label of the target domain with an actual label of the target domain to obtain diagnosis accuracy.

2. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 1, wherein S1 comprises: establishing data sets for different workloads, wherein each data set is named after a workload of the data set, data in different data sets obeys different distributions, and the data set is a source domain or a target domain; and segmenting samples by using N consecutive sampling points as one sample length to make the data set, to obtain the source domain sample and the target domain sample; wherein the source domain sample is custom character .sub.s={(x.sub.i.sup.s,y.sub.i.sup.s)}.sub.i=1.sup.n.sup.s, y.sub.i.sup.s={1, 2, . . . , C} denotes a source domain formed by samples of C different labels, x.sub.i.sup.s denotes an i.sup.th sample in the source domain, y.sub.i.sup.s denotes a label of the i.sup.th sample in the source domain, and ns is a total quantity of samples in the source domain; the target domain sample is custom character .sub.t={(x.sub.j.sup.t)}.sub.j=1.sup.n.sup.t, x.sub.j.sup.t denotes a j.sup.th sample in the target domain, and n.sub.t is a quantity of all samples in the target domain; data in the source domain is acquired under a probability distribution P.sub.s, data in the target domain is acquired under a probability distribution P.sub.t, and P.sub.s?P.sub.t; and the data in the source domain and the data in the target domain obey different distributions.

3. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 1, wherein S2 comprises: modifying a structure of ResNet-50 of the deep convolutional neural network model, and modifying a quantity of neurons outputted by the last fully connected layer to be equal to a total quantity of labels.

4. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 1, wherein between S2 and S3, the method further comprises: performing fast Fourier transform (FFT) on the source domain sample in a time domain sample to obtain a source domain sample in a frequency domain.

5. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 1, wherein the classifier in S2 is a softmax classifier.

6. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 1, wherein the processing the high-level features of the source domain sample and the target domain sample by using MMD to obtain a marginal distribution distance between the source domain sample and the target domain sample in S3 comprises: the marginal distribution distance $M M D (X^{s}, X^{t}) = {.Math. \frac{1}{n_{s}} {.Math.}_{i = 1}^{n_{s}} ? (x_{i}^{s}) - \frac{1}{n_{t}} {.Math.}_{j = 1}^{n_{f}} ? (x_{j}^{t}) .Math.}_{H}^{2} = \frac{1}{n_{s}^{2}} {.Math.}_{i = 1}^{n_{s}} {.Math.}_{j = 1}^{n_{s}} K (x_{i}^{s}, x_{j}^{s}) + \frac{1}{n_{t}^{2}} {.Math.}_{i = 1}^{n_{t}} {.Math.}_{j = 1}^{n_{t}} K (x_{i}^{t}, x_{j}^{t}) - \frac{2}{n_{s} n_{t}} {.Math.}_{i = 1}^{n_{s}} {.Math.}_{j = 1}^{n_{t}} K (x_{i}^{s}, x_{j}^{t}),$ wherein H denotes reproducing kernel Hilbert space, and ?(?) denotes a function to which a feature space is mapped; and K(?,?) denotes a Gaussian kernel function, and a formula of the Gaussian kernel function is as follows: $K (x_{i}, x_{j}) = e^{- \frac{{.Math. x_{i} - x_{j} .Math.}^{2}}{2 ?^{2}}},$ wherein ? is a bandwidth, x.sub.i.sup.s denotes an i.sup.th sample in the source domain, and x.sub.j.sup.t denotes a j.sup.th sample in the target domain.

7. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 1, wherein the processing the high-level features of the source domain sample and the target domain sample, the actual label of the source domain sample, and the soft pseudo label of the target domain sample by using weighted conditional MMD, to obtain a conditional distribution distance between the source domain sample and the target domain sample in S3 comprises: a weighted conditional distribution distance: $WCMMD (X^{s}, X^{t}) = \frac{1}{C} {.Math.}_{c = 1}^{C} {.Math. {.Math.}_{i = 1}^{n_{s}^{c}} w_{i}^{s c} ? (x_{i}^{s c}) - \frac{1}{n_{t}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} w_{i}^{t c} ? (x_{j}^{t c}) .Math.}_{H}^{2} = \frac{1}{C} {.Math.}_{c = 1}^{C} [{.Math.}_{i = 1}^{n_{s}^{c}} {.Math.}_{j = 1}^{n_{s}^{c}} w_{i}^{s c} w_{j}^{s c} K (x_{i}^{s c}, x_{j}^{s c}) + {.Math.}_{i = 1}^{n_{t}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} w_{i}^{t c} w_{j}^{t c} K (x_{i}^{t c}, x_{j}^{t c}) - 2 {.Math.}_{i = 1}^{n_{s}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} w_{i}^{s c} w_{j}^{t c} K (x_{i}^{s c}, x_{j}^{t c})],$ wherein w.sub.i.sup.sc and w.sub.i.sup.tc denote weights of corresponding samples, and a calculation formula is as follows: $w_{i}^{c} = \frac{y_{i}^{c}}{\overset{n}{\underset{j = 1}{.Math.}} y_{j}^{c}},$ wherein y.sub.i.sup.c is a value at a c.sup.th position of a soft label y.sub.i corresponding to an i.sup.th sample x.sub.i, x.sub.i.sup.s denotes an i.sup.th sample in the source domain, and x.sub.j.sup.t denotes a j.sup.th sample in the target domain; for the source domain sample, y.sub.i is a one-hot vector of an actual label of the sample x.sub.i; and for the target domain sample, y.sub.i is a probability distribution ?.sub.i=f(x.sub.i) outputted by the classifier, the probability distribution outputted by the classifier is a vector formed by C elements, and each element denotes a probability that a sample belongs to a label.

8. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 7, wherein the calculating a label loss between the predicted label of the source domain sample outputted by the classifier and an actual label of the source domain sample in S3 comprises: the label loss: $l o s s_{l o b e l} (Y_{s}, f (X_{s})) = \frac{1}{n_{s}} {.Math.}_{j = 0}^{n_{s}} J (y_{s}, f (x_{s})),$ wherein J(?,?) denotes a cross-entropy loss function, and $J (y_{s}, {\overset{?}{y}}_{s}) = - {.Math.}_{i = 1}^{C} y_{i} \log ({\overset{?}{y}}_{i}) .$

9. The dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions according to claim 7, wherein S4 comprises: setting a dynamic parameter ?, $? = \frac{MMD (X_{s}, X_{t})}{MMD (X_{s}, X_{t}) + WCMMD (X_{s}, X_{t})},$ wherein MMD(X.sub.s,X.sub.t) and WCMMD(X.sub.s,X.sub.t) respectively denote the marginal distribution distance and the conditional distribution distance between the source domain sample and the target domain sample; and the target function: $\begin{matrix} los s_{total} = \min_{f} J (Y_{s}, f (X_{s})) + ? J D D (X_{s}, X_{t}) \\ = \min_{f} \frac{1}{n_{s}} {.Math.}_{i = 0}^{n_{s}} J (y_{s}, f (x_{s})) + \\ ? ((1 - ?) M M D (X_{s}, X_{t}) + ? W C M M D (X_{s}, X_{t})) \end{matrix},$ wherein J(?,?) denotes a cross-entropy loss function, ? is a hyperparameter (?>0), $? = \frac{2}{1 + e^{\frac{- 1 0^{*} s t e p}{steps}}} - 1,$ steps is a total quantity of times of training, and step is a current training step number.

10. A computer device, comprising a memory, a processor, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor is configured to execute the program to implement the steps in the method according to claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a diagram illustrating steps of a rolling bearing fault diagnosis method based on a convolutional neural network and transfer learning under variable working conditions according to the present invention;

(2) FIG. 2 is a structural diagram of a dynamic distribution alignment network model according to the present invention;

(3) FIG. 3 is a flowchart of a diagnosis method according to the present invention; and

(4) FIG. 4 is a diagram illustrating a diagnostic result in a bearing fault target domain (a test set) according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

(5) The present invention is further described below with reference to the accompanying drawings and specific embodiments, to enable a person skilled in the art to better understand and implement the present invention. However, the embodiments are not used to limit the present invention.

(6) The present invention is described below in detail with reference to actual experimental data.

(7) The experimental data is a bearing data set made in a laboratory. A data acquisition system is formed by a drive motor, a bolt and nut loading system, a healthy bearing, a test bearing, a vibration acceleration sensor connected to a computer, and other major components. An accelerometer acquires vibration data, and a sampling frequency is 10 kHz. Faults are introduced into a roller, an inner ring, and an outer ring by using an electrical discharge machining (EDM) technique, and different fault sizes are set.

(8) As shown in FIG. 1 to FIG. 3, the present invention includes the following steps.

(9) Step 1: acquiring bearing vibration data under different working conditions to obtain a source domain sample and a target domain sample, where a bearing has different health states under each working condition, bearing vibration data in different health states under each working condition is used as a data domain, and the source domain sample and the target domain sample are selected from the data domain; and the source domain sample is attached with a label, and the label corresponds to a fault type and a fault size. Specifically, Step 1 includes the following steps.

(10) Step 1.1: establishing data sets for different workloads (that is, different working conditions), where each data set is named after a workload of the data set, data in different data sets obeys different distributions, each data set includes a normal bearing state, an outer ring fault, an inner ring fault, and different fault sizes, and each data set is a source domain or a target domain.

(11) Data sets (0 hp, 1 hp, 2 hp, and 3 hp) under four different working conditions, that is, variable loads, are established in this embodiment, to simulate transfer learning tasks of a rolling bearing under variable working conditions. These data sets are named after workloads of the data sets. For example, the data set 0 hp denotes that a sample comes from a vibration signal acquired under a workload 0 hp. Therefore, the four data sets of the variable loads denote four domains with different data distributions. Single-point faults are set for a roller, an inner ring, and an outer ring by using an EDM technique. Fault degrees are 0.2 millimeters and 0.3 millimeters.

(12) Step 1.2: The source domain sample is custom character .sub.s={(x.sub.i.sup.s,y.sub.i.sup.s)}.sub.i=1.sup.n.sup.s, y.sub.i.sup.s?{1, 2, . . . , C} denotes a source domain formed by samples of C different labels, x.sub.i.sup.s denotes an i.sup.th sample in the source domain, y.sub.i.sup.s denotes a label of the i.sup.th sample in the source domain, and n.sub.s is a total quantity of samples in the source domain; custom character .sub.t={(x.sub.j.sup.t)}.sub.j=1.sup.n.sup.t, x.sub.j.sup.t denotes a target domain formed by samples without labels, x.sub.j.sup.t denotes a j.sup.th sample in the target domain, and n.sub.t is a quantity of all samples in the target domain; and data in the source domain is acquired under a probability distribution P.sub.s, data in the target domain is acquired under a probability distribution P.sub.t, and P.sub.s?P.sub.t. Table 1 is detailed description of seven samples in each domain.

(13) TABLE-US-00001 TABLE 1 Fault Sample Symbol size/mm Status Label quantity representation Normal 1 200 NO 0.2 Inner ring fault 2 200 IF2 0.2 Ball fault 3 200 BF2 0.2 Outer ring fault 4 200 OF2 0.3 Inner ring fault 5 200 IF3 0.3 Ball fault 6 200 BF3 0.3 Outer ring fault 7 200 OF3

(14) A diagnosis task across variable working conditions is denoted by a symbol A hp.fwdarw.B hp. A source domain is a data set A hp. All samples and labels of the samples participate in a training process. B hp denotes the target domain. Actual labels of samples in the target domain do not participate in training and are only used for testing in a validation process.

(15) Step 2: establishing a deep convolutional neural network model with dynamic joint distribution alignment, where the deep convolutional neural network model includes a feature extractor and a classifier; and modifying the last layer of neurons of the classifier, to enable a quantity of the last layer of neurons of the classifier to be equal to a quantity of types of labels.

(16) Specifically, Step 2 includes: modifying the structure of ResNet-50 of the deep convolutional neural network model, and modifying a quantity of neurons outputted by the last fully connected layer to be equal to a total quantity of labels. The classifier in S2 is a softmax classifier.

(17) Between Step 2 and Step 3, the present invention further includes: performing FFT on the source domain sample in a time domain sample to obtain a source domain sample in a frequency domain. The step specifically includes:

(18) performing FFT on samples with labels in the source domain and samples without labels in the target domain, and converting a time domain signal into a frequency domain signal, where a formula of FFT of a decimation in time algorithm is shown as follows:

(19) 0 $\begin{matrix} \begin{matrix} X (k) = {.Math.}_{n = 0}^{N - 1} x (n) W_{n}^{nk}, 0 ? k ? N - 1, W_{N} = e^{- j \frac{2 ?}{N}} \\ {\begin{matrix} \begin{matrix} X (k) = X_{1} (k) + W_{N}^{k} X_{2} (k) \\ X (k + \frac{N}{2}) = X_{1} (k) - W_{N}^{k} X_{2} (k) \end{matrix} & , k = 0, 1, 2, .Math., \frac{N}{2} - 1 \end{matrix} \end{matrix}, & (1) \end{matrix}$

(20) where x(n) denotes a value of an n.sup.th sampling point in an original sample time sequence, and X(k) denotes a k.sup.th value in a spectrogram.

(21) Step 3: feeding both the source domain sample and the target domain sample into the deep convolutional neural network model with initialized parameters, and extracting, by the feature extractor, high-level features of the source domain sample and the target domain sample; processing the high-level features by using the classifier, to generate a predicted label of the source domain sample and a soft pseudo label of the target domain sample, and calculating a label loss between the predicted label of the source domain sample outputted by the classifier and an actual label of the source domain sample; and

(22) processing the high-level features of the source domain sample and the target domain sample by using MMD to obtain a marginal distribution distance between the source domain sample and the target domain sample, and processing the high-level features of the source domain sample and the target domain sample, the actual label of the source domain sample, and the soft pseudo label of the target domain sample by using weighted conditional MMD, to obtain a conditional distribution distance between the source domain sample and the target domain sample.

(23) Specifically, a deep adaptive convolutional neural network model extracts deep-level features of samples by using improved ResNet-50. A process of the extraction is as follows:
g(x)=f(x)(2),

(24) where x denotes a frequency domain sample obtained after FFT, f( ) denotes modified ResNet-50, and g(x) denotes low-level features extracted from samples by using improved ResNet-50.

(25) A high-level feature extracted by a deep-level feature extractor is used as an input of the softmax classifier. A softmax function is denoted as follows:

(26) $\begin{matrix} q_{i} = \frac{e^{V_{i}}}{{.Math.}_{i = 0}^{C - 1} e^{V_{i}}}, & (3) \end{matrix}$

(27) where q.sub.i denotes a probability that a sample belongs to a label i, C is a total quantity of types of labels, and vi is a value at an i.sup.th position inputted by the softmax function.

(28) Step 3.3: separately calculating a marginal distribution distance and a conditional distribution distance between a high-level feature in the source domain and a high-level feature in the target domain. The marginal distribution distance is calculated by using MMD, and a formula of a Gaussian kernel function is as follows:

(29) $\begin{matrix} MMD (X^{s}, X^{t}) = {.Math. \frac{1}{n_{s}} {.Math.}_{i = 1}^{n_{s}} ? (x_{i}^{s}) - \frac{1}{n_{t}} {.Math.}_{j = 1}^{n_{t}} ? (x_{j}^{t}) .Math.}_{H}^{2} = \frac{1}{n_{s}^{2}} {.Math.}_{i = 1}^{n_{s}} {.Math.}_{j = 1}^{n_{s}} K (x_{i}^{s}, x_{j}^{s}) + \frac{1}{n_{t}^{2}} {.Math.}_{i = 1}^{n_{t}} {.Math.}_{j = 1}^{n_{t}} K (x_{i}^{t}, x_{j}^{t}) - \frac{2}{n_{s} n_{t}} {.Math.}_{i = 1}^{n_{s}} {.Math.}_{j = 1}^{n_{t}} K (x_{i}^{s}, x_{j}^{t}), & (4) \end{matrix}$

(30) where H denotes a reproducing kernel Hilbert space, and ?(?) denotes a function to which a feature space is mapped; and K(?,?) denotes the Gaussian kernel function, and a formula of the Gaussian kernel function is as follows:

(31) $\begin{matrix} K (x_{i}, x_{j}) = e^{- \frac{{.Math. x_{i} - x_{j} .Math.}^{2}}{2 ?^{2}}}, & (5) \end{matrix}$

(32) where ? is a bandwidth, and a marginal distance between the source domain and the target domain may be optimized by minimizing MMD.

(33) Because a label of the target domain sample in a training process are unknown, it seems to be impossible to match a conditional distribution distance between the source domain and the target domain. A predetermined result of the target domain sample by a deep learning model in a training iteration process is used as a pseudo label to calculate the conditional distribution distance of the source domain and the target domain. A formula of the conditional distribution distance is introduced as follows:

(34) $\begin{matrix} CMMD (X^{s}, X^{t}) = \frac{1}{C} {.Math.}_{c = 1}^{C} {.Math. \frac{1}{n_{c}^{s}} \underset{i = 1}{\overset{n_{s}^{c}}{.Math.}} ? (x_{i}^{sc}) - \frac{1}{n_{t}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} ? (x_{j}^{tc}) .Math.}_{H}^{2} = \frac{1}{C} {.Math.}_{c = 1}^{C} [\frac{1}{{(n_{s}^{c})}^{2}} \overset{n_{s}^{c}}{\underset{i = 1}{.Math.}} {.Math.}_{j = 1}^{n_{s}^{c}} K (x_{i}^{sc}, x_{j}^{sc}) + \frac{1}{{(n_{t}^{c})}^{2}} {.Math.}_{i = 1}^{n_{t}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} K (x_{i}^{tc}, x_{j}^{tc}) - \frac{2}{n_{s}^{c} n_{t}^{c}} {.Math.}_{i = 1}^{n_{s}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} K (x_{i}^{sc}, x_{j}^{tc})], & (6) \end{matrix}$

(35) where c denotes a c.sup.th label in C labels, and ?(?) denotes a function to which a feature space is mapped; x.sub.i.sup.sc denotes an i.sup.th sample of samples with a label of c in the source domain, n.sub.s.sup.c and n.sub.t.sup.c respectively denote quantities of all samples with the label of c in the source domain and the target domain, and x.sub.j.sup.tc denotes a j.sup.th sample of samples with the label of c in the source domain; the foregoing formula is used to estimate a difference between conditional distributions P.sub.s(x.sub.s|y.sub.s=c) and P.sub.t(x.sub.t|y.sub.t=c); and the difference between conditional distributions of the source domain and the target domain can be reduced by minimizing the foregoing formula.

(36) However, because the foregoing model may give an incorrect hard label in a training process, and the incorrect hard label may cause negative transfer, a weighted conditional distribution distance is introduced, a formula is as follows:

(37) $\begin{matrix} WCMMD (X^{s}, X^{t}) = \frac{1}{C} {.Math.}_{c = 1}^{C} {.Math. {.Math.}_{i = 1}^{n_{s}^{c}} w_{i}^{sc} ? (x_{i}^{sc}) - \frac{1}{n_{t}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} w_{i}^{tc} ? (x_{j}^{tc}) .Math.}_{H}^{2} = \frac{1}{C} {.Math.}_{c = 1}^{C} [{.Math.}_{i = 1}^{n_{s}^{c}} {.Math.}_{j = 1}^{n_{s}^{c}} w_{i}^{sc} w_{j}^{sc} K (x_{i}^{sc}, x_{j}^{sc}) + {.Math.}_{i = 1}^{n_{t}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} w_{j}^{tc} w_{j}^{tc} K (x_{i}^{tc}, x_{j}^{tc}) - 2 {.Math.}_{i = 1}^{n_{s}^{c}} {.Math.}_{j = 1}^{n_{t}^{c}} w_{i}^{sc} w_{j}^{tc} K (x_{i}^{sc}, x_{j}^{tc})] & (7) \end{matrix}$

(38) where w.sub.i.sup.sc and w.sub.i.sup.tc denote weights of corresponding samples, and a calculation formula is as follows:

(39) $\begin{matrix} w_{i}^{c} = \frac{y_{i}^{c}}{{.Math.}_{j = 1}^{n} y_{j}^{c}}, & (8) \end{matrix}$

(40) where y.sub.i.sup.c is a value at a c.sup.th position of a soft label y.sub.i corresponding to an i.sup.th sample x.sub.i; for the source domain sample, y.sub.i is a one-hot vector of an actual label of the sample x.sub.i; and for the target domain sample, y.sub.i is a probability distribution outputted by softmax, that is, ?.sub.i=f(x.sub.i), and is a vector formed by C elements, and each element denotes a probability that a sample belongs to a label.

(41) Although a pseudo label rather than an actual label of the target domain sample is used in a training process, as the number of times of iteration increases, a training error becomes increasingly small, and the pseudo label keeps approximating to the real label, thereby implementing as accurate as possible classification of samples in the target domain.

(42) A predicted label training error of the source domain sample is calculated. A process of the calculation is shown in the following formula:

(43) $\begin{matrix} {loss}_{label} (Y_{s}, f (X_{s})) = \frac{1}{n_{s}} {.Math.}_{i = 0}^{n_{s}} J (y_{s}, f (x_{s})), & (9) \end{matrix}$

(44) where J(?,?) denotes a cross-entropy loss function, which is defined as follows:

(45) $\begin{matrix} J (y_{s}, {\overset{?}{y}}_{s}) = - {.Math.}_{i = 1}^{C} y_{i} \log ({\overset{?}{y}}_{i}) . & (10) \end{matrix}$

(46) Step 4: obtaining a joint distribution distance according to the marginal distribution distance and the conditional distribution distance, and combining the joint distribution distance and the label loss to obtain a target function.

(47) Specifically, a dynamic parameter ? is introduced to calculate a relative importance of the marginal distribution distance and the conditional distribution distance to form the joint distribution distance. The joint distribution distance is combined with the predicted label training error of the source domain sample to form a final target function to be optimized. A formula is as follows:

(48) $\begin{matrix} l o s s_{total} = \min_{f} J (Y_{s}, f (X_{s})) + ? J D D (X_{s}, X_{t}) = \min_{f} \frac{1}{n_{s}} {.Math.}_{i = 0}^{n_{s}} J (y_{s}, f (x_{s})) + ? ((1 - ?) M M D (X_{s}, X_{t}) + ? W C M M D (X_{s}, X_{t})), & (11) \end{matrix}$

(49) where J(?,?) denotes a cross-entropy loss function, ? is a hyperparameter (?>0), and ? is the dynamic parameter. With the foregoing formula, the network proposed in the training can accurately predict a label of the target domain sample.

(50) The hyperparameter ? in the foregoing formula is set as follows:

(51) 0 $\begin{matrix} ? = \frac{2}{1 + e^{\frac{- 1 0^{*} s t e p}{s t e p s}}} - 1, & (12) \end{matrix}$

(52) where steps is a total quantity of times of training, and step is a current training step number.

(53) The dynamic parameter ? is set as follows:

(54) $\begin{matrix} ? = \frac{MMD (X_{s}, X_{t})}{MMD (X_{s}, X_{t}) + WCMMD (X_{s}, X_{t})}, & (13) \end{matrix}$

(55) where MMD(X.sub.s,X.sub.t) and WCMMD(X.sub.s,X.sub.t) respectively denote the marginal distribution distance and the conditional distribution distance between the source domain sample and the target domain sample.

(56) Step 5: optimizing the target function by using SGD, and training the deep convolutional neural network model to obtain an optimized deep convolutional neural network model.

(57) Specifically, a high-level feature alignment target function is minimized by using SGD, to update parameters of the model, thereby training the model.

(58) $\begin{matrix} ?_{i} < ? ?_{i} / ? \frac{?}{? ?_{i}} l^{*} ? +, & (14) \end{matrix}$

(59) where ? denotes all parameters in the model, ?.sub.i denotes an i.sup.th parameter, l(?) denotes a target function related to the parameter ?, and ? is a learning rate, that is, a step size.

(60) Step 6: inputting the target domain sample into the optimized deep convolutional neural network model to obtain a predicted label of a target domain; and comparing the predicted label of the target domain with an actual label of the target domain to obtain diagnosis accuracy.

(61) Specifically, samples without a label in the target domain are fed into the trained dynamic joint distribution alignment network to obtain predicted labels of all target domain samples, and the predicted labels are compared with labels that are manually marked in advance but have not participated in a training process, to obtain the diagnosis accuracy, thereby validating the superiority of the model. A calculation formula of the diagnosis accuracy is as follows:

(62) $\begin{matrix} a c c = \frac{1}{n_{t}} {.Math.}_{i = 0}^{n_{t} - 1} sign (F (x_{i}^{t}) = y_{i}^{t}), & (15) \end{matrix}$

(63) where sign( ) denotes an indicator function, y.sub.i.sup.t is an actual label of an i.sup.th sample in the target domain, F(x.sub.i.sup.t) is a predicted result of the i.sup.th sample in the target domain by the model, and n.sub.t is a total quantity of samples in the target domain. Diagnostic results under four variable working conditions are shown in FIG. 4. As can be seen from the diagnostic results, the average accuracy of diagnosis tasks in the four variable working conditions has reached 98.1% and a recall rate of 0.98. It indicates that by means of the present invention, on the basis that relatively high diagnosis accuracy is obtained, the universality and generality of a deep learning model are greatly improved, and the impact of a domain shift on a conventional deep learning-based fault diagnosis method under variable working conditions can be adequately reduced.

(64) In summary, in the present invention, based on a convolutional neural network and a transfer learning algorithm, a dynamic joint distribution alignment network-based bearing fault diagnosis model is designed. Compared with a conventional deep learning method, the present invention can better mitigate the negative impact of a domain shift on a deep learning model, better meet actual scenarios of industrial applications, and satisfy the demand for fault diagnosis under variable working conditions.

(65) The foregoing embodiments are merely preferred embodiments used to fully describe the present invention, and the protection scope of the present invention is not limited thereto. Equivalent replacements or variations made by a person skilled in the art to the present invention all fall within the protection scope of the present invention. The protection scope of the present invention is as defined in the claims.

Dynamic joint distribution alignment network-based bearing fault diagnosis method under variable working conditions

Assignee

Inventors

Cpc classification

Classification Explorer

G06F11/2263

PHYSICS

Classification Explorer

G06N3/0464

PHYSICS

Classification Explorer

G01M13/045

PHYSICS

Classification Explorer

G06N3/096

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

International classification

Classification Explorer

G06F11/22

PHYSICS

Classification Explorer

G01M13/045

PHYSICS

Classification Explorer

G06N3/0464

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Abstract

Claims

Description