INTRA-CLASS ADAPTATION FAULT DIAGNOSIS METHOD FOR BEARING UNDER VARIABLE WORKING CONDITIONS

Abstract

The invention relates to a fault diagnosis method for a rolling bearing under variable working conditions. Based on a convolutional neural network, a transfer learning algorithm is combined to handle the problem of the reduced universality of deep learning models. Data acquired under different working conditions is segmented to obtain samples. The samples are preprocessed by using FFT. Low-level features of the samples are extracted by using improved ResNet-50, and a multi-scale feature extractor analyzes the low-level features to obtain high-level features as inputs of a classifier. In a training process, high-level features of training samples and test samples are extracted, and a conditional distribution distance between them is calculated as a part of a target function for backpropagation to implement intra-class adaptation, thereby reducing the impact of domain shift, to enable a deep learning model to better carry out fault diagnosis tasks.

Claims

1. A fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning, comprising: Step 1: acquiring bearing vibration data in different health states under different working conditions, using the bearing vibration data in the different health states under each working condition as one domain, wherein data in different domains conforms to different distributions, and segmenting the data to form samples; Step 2: performing a fast Fourier transform (FFT) on samples in a source domain and samples in a target domain, and feeding labeled samples in the source domain and unlabeled samples in the target domain at the same time into a deep intra-class adaptation convolutional neural network model with initialized parameters in a training stage; Step 3: extracting low-level features of the samples by using improved ResNet-50 in the deep intra-class adaptation convolutional neural network model, performing, by a multi-scale feature extractor, further analysis based on the low-level features to obtain high-level features as an input of a classifier, and at the same time calculating a conditional distribution distance between the high-level features of the samples in the two domains; Step 4: combining the conditional distribution distance between the source domain and the target domain and classification loss of samples in the source domain to form a target function, optimizing the target function by using a stochastic gradient descent (SGD) method, and training the parameters of the model; Step 5: inputting a sample set of the target domain into a trained deep neural network diagnosis model, qualitatively and quantitatively determining a fault type and a fault size of each test sample by using an actual outputted label value, and performing comparison with labels that are marked in advance but do not participate in training, to obtain diagnosis accuracy.

2. The fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to claim 1, wherein Step 1 comprises: Step 1.1: establishing data sets of different workloads, wherein each data set is named after a workload of the data set, and data in the data sets conforms to different distributions; and Step 1.2: segmenting samples with N consecutive sampling points as one sample length to make a data set, and using custom-characters={(x.sub.i.sup.s,y.sub.i.sup.s)}n.sub.s−1 (i=0), wherein y.sub.i.sup.s∈{0,1,2, . . . ,C−1} represents a source domain formed by samples with C different types of labels, x.sub.i.sup.s represents an i.sup.th sample in the source domain, represents a label of the i.sup.th sample in the source domain, and n.sub.s is a total quantity of samples in the source domain; using custom-character.sub.t={x.sub.j.sup.t}n.sub.t−1 (j=0) to represent a target domain formed by the unlabeled samples, wherein x.sub.j.sup.t represents a i.sup.th sample in the target domain, and n.sub.t is a quantity of all samples in the target domain; acquiring data of the source domain in a probability distribution P.sub.s, and acquiring data of the target domain in a probability distribution P.sub.t, wherein P.sub.s≠P.sub.t, and the data of the source domain and the data of the target domain conform to different distributions.

3. The fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to claim 2, wherein in Step 1.2, all labeled samples in the source domain and labels of the samples and the unlabeled samples in the target domain are used for training, and labels of samples in the target domain are only used for a test process and do not participate in training.

4. The fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to claim 1, wherein Step 3 comprises: Step 3.1: modifying the structure of ResNet-50, and removing the last two layers of the model: a global average pooling layer and a fully connected layer used for classification, wherein the deep intra-class adaptation convolutional neural network model extracts the low-level features of the samples by using the improved ResNet-50, and a process of the extraction is as follows:
g(x)=f(x) wherein x represents a sample in a frequency domain after the FFT, f(⋅) represents the modified ResNet-50, and g(x) represents the low-level features extracted from the samples by using the improved ResNet-50; Step 3.2: further analyzing, by a plurality of substructures of the multi-scale feature extractor, the low-level features at the same time to obtain the high-level features as an input of a softmax classifier, wherein a process of extracting the high-level features is represented as follows:
g(x)=[g.sub.0(x),g.sub.1(x), . . . ,g.sub.n−1(x)], wherein g.sub.i(x) is an output of one substructure, i∈{0,1,2, . . . ,n−1}, and n is a total quantity of substructures in the feature extractor; and a softmax function is represented as follows: q i = e V 1 .Math. i = 0 C - 1 e V i , wherein q.sub.i represents a probability that a sample belongs to a label i, C is a total quantity of label classes, and v.sub.i is a value of an i.sup.th position of an input of the softmax function; and Step 3.3: calculating the conditional distribution distance between the high-level features in the source domain and the target domain, wherein because labels of samples in the target domain are unknown in the training process, it seems impossible to match the conditional distribution distance between the source domain and the target domain, and predetermined results for samples in the target domain by a deep learning model in a training iteration process are used as pseudo labels to calculate the conditional distribution distance between the source domain and the target domain, and a formula for a conditional distribution distance between features extracted by one substructure of the multi-scale feature extractor is as follows: d H ( X s , X t ) = 1 C .Math. c = 0 C - 1 .Math. 1 n s ( c ) .Math. i = 0 n s ( c ) 1 Φ ( X i s ( c ) ) - 1 n t ( c ) .Math. j = 0 n t ( c ) 1 Φ ( X j t ( c ) ) .Math. H 2 , wherein H represents the reproducing kernel Hilbert space, and Φ(⋅) represents a function of feature space mapping; x.sub.i.sup.s(c) represents an i.sup.th sample in samples with a label of c in the source domain, n.sub.s.sup.(c) is equal to a quantity of all samples with the label of c in the source domain, x.sub.j.sup.t(c) represents a j.sup.th sample in samples with a pseudo label of c in the target domain, and n.sub.t.sup.(c) is equal to a quantity of all samples with the pseudo label of c in the target domain; the foregoing expression is used for estimating a difference between intra-class condition distributions P.sub.s(x.sub.s|y.sub.s=c) and P.sub.t(x.sub.t|y.sub.t=c); a conditional distribution difference between the source domain and the target domain can be reduced by minimizing the foregoing expression; and because the high-level features are extracted by the plurality of substructures at the same time, a total conditional distribution distance is as follows: d ( X s , X t ) = .Math. i = 0 n - 1 d ( g i ( X s ) , g i ( X t ) ) , wherein g.sub.i(x) is the output of one substructure, i∈{0,1,2, . . . ,n−1}, and n is the total quantity of substructures in the feature extractor; although pseudo labels rather than actual labels of samples in the target domain are used in the training process, as a quantity of iterations increases, a training loss keeps decreasing, and the pseudo labels keep approaching the actual labels, to maximize the accuracy of classifying samples in the target domain.

5. The fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to claim 1, wherein Step 4 comprises: Step 4.1: calculating classification loss of the samples in the source domain in the training process, wherein a process of the calculation is shown by the following formula: loss classifier ( y , X ) = 1 n .Math. i = 0 n - 1 J ( y i , F ( x i ) ) , wherein X represents a set of all samples in the source domain, and y represents a set of actual labels of all the samples in the source domain; n is a quantity of samples that participate in training, y.sub.i is an actual label of an i.sup.th sample, and F(x.sub.i) is a predicted result of the i.sup.th sample by a neural network; and J(⋅,⋅) represents a cross entropy loss function, and is defined as follows: J ( p , q ) = - .Math. i = 0 C - 1 p i log ( q i ) , wherein when i is an actual label of the sample, p.sub.i is equal to 1, or otherwise, p.sub.i is equal to 0; q.sub.i is a probability outputted after a softmax activation function; and C is a total quantity of label classes; and Step 4.2: combining the conditional distribution distance and the label training prediction loss of the samples in the source domain to form a multi-scale high-level feature alignment target function to be optimized, wherein a formula of the target function is as follows: l total = min l classifier ( y s , X s ) + λ .Math. i = 0 n sub - 1 d ( g i ( X s ) , g i ( X t ) ) = min F 1 n s .Math. i = 0 n s - 1 J ( y i s , F ( x i s ) ) + λ .Math. i = 0 n sub - 1 d H ( g i ( X s ) , g i ( X t ) ) , wherein F(⋅) represents a model output function, g(⋅) represents an output of one substructure in the multi-scale feature extractor, J(⋅,⋅) represents the cross entropy loss function, λ>0 is a superparameter, n.sub.sub is equal to a quantity of substructures in the multi-scale feature extractor, and d.sub.H(⋅,⋅) is the conditional distribution distance; and the foregoing expression may be used to enable the network F(⋅) provided from training to accurately predict labels of samples from the target domain.

6. The fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to claim 5, wherein the superparameter λ is set as follows: λ = 2 10 epoch 1 + e epochs 1 , wherein epochs is a total quantity of times of training, and epoch is a current quantity of training steps.

7. The fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to claim 1, wherein Step 5 comprises: feeding the unlabeled samples in the target domain into a trained multi-scale deep intra-class adaptation convolutional neural network model to obtain predicted labels of all samples in the target domain, and performing comparison with labels that are manually marked in advance but do not participate in a training process, to obtain diagnosis accuracy to verify the high quality of the model.

8. A computer device, comprising a memory, a processor, and a computer program that is stored in the memory and is executable by the processor, wherein the processor is configured to execute the program to implement the steps in the method according to claim 1.

9. A computer-readable storage medium, on which a computer program is stored, wherein the program is executed by a processor to implement the steps in the method according to claim 1.

10. A processor, configured to execute a program, wherein the program is executed to perform the method according to claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0045] FIG. 1 shows the steps of a fault diagnosis method for a rolling bearing under variable working conditions based on a convolutional neural network and transfer learning according to the present invention.

[0046] FIG. 2 is a structural diagram of a corresponding deep learning model according to the present invention.

[0047] FIG. 3 is a detailed structural diagram of ResNet-50 in a deep learning model according to the present invention.

[0048] FIG. 4 is a principle diagram of residual blocks in ResNet-50.

[0049] FIG. 5 is a flowchart of a corresponding diagnosis method according to the present invention.

[0050] FIG. 6 is a diagram of a frequency domain of a vibration signal of a bearing in different health states according to an embodiment of the present invention.

[0051] FIG. 7 is a diagram of classification results of a target domain (test set) of bearing faults according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0052] Specific implementations of the present invention are further described below in detail with reference to the accompanying drawings and embodiments. The following embodiments are used to describe the present invention, but are not used to limit the scope of the present invention.

[0053] The present invention is described below in detail with reference to actual experimental data:

[0054] The experimental data uses a bearing data set from Case Western Reserve University. A data acquisition system is formed by three major parts: a motor, a torque sensor, and a dynamometer machine. An accelerometer is utilized to acquire vibration data, and the sampling frequency is 12 KHz. Faults are introduced into the roller, the inner ring, and the outer ring by using electrical discharge machining (EDM) technologies, and different fault sizes are set.

[0055] As shown in FIG. 1, the present invention includes the following steps:

[0056] Step 1: acquiring bearing vibration data in different health states under different working conditions, using the bearing vibration data in the different health states under each working condition as one domain, where data in different domains conforms to different distributions, and segmenting the data to form samples;

[0057] Step 2: performing an FFT on samples in a source domain and samples in a target domain, and feeding labeled samples in the source domain and unlabeled samples in the target domain at the same time into a deep intra-class adaptation convolutional neural network model with initialized parameters in a training stage;

[0058] Step 3: extracting low-level features of the samples by using improved ResNet-50 in the deep intra-class adaptation convolutional neural network model, performing, by a multi-scale feature extractor, further analysis based on the low-level features to obtain high-level features as inputs of a classifier, and at the same time calculating the conditional distribution distance between the high-level features of the samples in the two domains;

[0059] Step 4: combining the conditional distribution distance between the source domain and the target domain and classification loss of samples in the source domain to form the target function, optimizing the target function by using an SGD method, and training the parameters of the model; and

[0060] Step 5: inputting a sample set of the target domain into a trained deep neural network diagnosis model, qualitatively and quantitatively determining a fault type and a fault size of each test sample by using actual outputted label values, and performing comparison with labels that are marked in advance but do not participate in training, to obtain diagnosis accuracy to verify the universality and generalization of the present invention.

[0061] Further, Step 1 specifically includes the following steps:

[0062] Step 1.1: Establishing data sets of different workloads (that is, different working conditions), where each data set is named after a workload of the data set, data in the data sets conforms to different distributions, and each data set includes a normal bearing state, an outer ring fault, an inner ring fault, and different fault sizes.

[0063] In this embodiment, data sets (0, 1, 2 and 3 hp) under four different working conditions are established. That is, variable loads are used to simulate a transfer learning task of a rolling bearing under variable working conditions. These data sets are named after the workloads of the data sets. For example, a data set 0 hp represents that samples come from vibration signals acquired under 0 hp workload. Therefore, the four data sets of variable loads represent four domains with different data distributions. Single-point faults are created at the roller, the inner ring, and the outer ring of a bearing by using the EDM technologies, and fault degrees are 0.007 inches, 0.014 inches, and 0.021 inches respectively.

[0064] Step 1.2: Segmenting samples with N consecutive sampling points as one sample length to make a data set, and using custom-characters={(x.sub.i.sup.s,y.sub.i.sup.s)}n.sub.s−1 (i=0) in the present invention, where y.sub.i.sup.s∈{0,1,2, . . . ,C−1} represents a source domain formed by samples with C different types of labels, x.sub.i.sup.s represents the i.sup.th sample in the source domain, y.sub.i.sup.s represents the label of the i.sup.th sample in the source domain, and n.sub.s is the total quantity of samples in the source domain; using custom-character.sub.t={x.sub.j.sup.t}n.sub.t−1 (j=0) to represent a target domain formed by the unlabeled samples, where x.sub.j.sup.t represents the j.sup.th sample in the target domain, and n.sub.t is the quantity of all samples in the target domain; and acquiring data of the source domain in a probability distribution P.sub.s, and acquiring data of the target domain in a probability distribution P.sub.t, where P.sub.s≠P.sub.t. For specific types of samples in each domain, reference may be made to Table 1.

TABLE-US-00001 TABLE 1 Detailed description of 10 types of samples in each domain Sample Symbol Fault size/inch Status Label quantity representation — Normal 0 200 NO 0.007 Inner ring fault 1 200 IF07 0.007 Roller fault 2 200 BF07 0.007 Outer ring fault 3 200 OF07 0.014 Inner ring fault 4 200 IF14 0.014 Roller fault 5 200 BF14 0.014 Outer ring fault 6 200 OF14 0.021 Inner ring fault 7 200 IF21 0.021 Roller fault 8 200 BF21 0.021 Outer ring fault 9 200 OF21

[0065] A diagnosis task under variable working conditions is represented by a symbol A hp.fwdarw.B hp. The source domain is a hp data set. All samples and labels of the samples participate in a training process. B hp represents the target domain, and actual labels of samples in the target domain do not participate in training and are only used for verification process.

[0066] Further, Step 2 specifically includes the following steps:

[0067] performing the FFT on the labeled samples in the source domain and the unlabeled samples in the target domain, and converting a time domain signal into a frequency domain signal, where an FFT formula of a time extraction algorithm is shown as follows:

[00011] X ( k ) = .Math. n = 0 N - 1 x ( n ) W N nk , 0 k N - 1 , W N = e - j 2 π N ( 1 ) { X ( k ) = X 1 ( k ) + W N k X 2 ( k ) X ( k + N 2 ) = X 1 ( k ) - W N k X 2 ( k ) , k = 0 , 1 , 2 , .Math. N 2 - 1 ,

[0068] where x(n) represents the value of the n.sup.th sampling point in an original sample time sequence, and X(k) represents the k.sup.th value in a spectrum graph. A frequency domain signal of each type of sample is shown in FIG. 6.

[0069] Further, Step 3 specifically includes the following steps:

[0070] A deep learning model established in this embodiment is shown in FIG. 2, and a fault diagnosis procedure is shown in FIG. 5, including:

[0071] Step 3.1: for a detailed structure of ResNet-50 shown in FIG. 3 and FIG. 4, modifying the structure of ResNet-50, and removing the last two layers of the model: a global average pooling layer and a fully connected layer used for classification, where the deep intra-class adaptation convolutional neural network model extracts the low-level features of the samples by using the improved ResNet-50, and a process of the extraction is as follows:


g(x)=f(x)   (2),

[0072] where x represents a sample in a frequency domain after the FFT, f(⋅) represents the modified ResNet-50, and g(x) represents the low-level features extracted from the samples by using the improved ResNet-50.

[0073] Step 3.2: further analyzing, by a plurality of substructures of the multi-scale feature extractor, the low-level features at the same time to obtain the high-level features as an input of the softmax classifier, where a process of extracting the high-level features is represented as follows:


g(x)=[g.sub.0(x),g.sub.1(x), . . . ,g.sub.n−1(x)]  (3),

[0074] where g.sub.i(x) is an output of one substructure, i∈{0,1,2, . . . ,n−1}, and n is a total quantity of substructures in the feature extractor; and the softmax function is represented as follows:

[00012] q i = e Vi .Math. i = 0 C - 1 e V i , ( 4 )

[0075] where q.sub.i represents the probability that a sample belongs to the label i, C is the total quantity of label classes, and v.sub.i is the value of the i.sup.th position of an input of the softmax function; and

[0076] Step 3.3: calculating the conditional distribution distance between the high-level features in the source domain and the target domain, where a formula for the conditional distance between features extracted by one substructure of the multi-scale feature extractor is as follows:

[00013] d H ( X s , X t ) = 1 C .Math. c = 0 C - 1 .Math. 1 n s ( c ) .Math. i = 0 n s ( c ) 1 Φ ( X i s ( c ) ) - 1 n t ( c ) .Math. j = 0 n t ( . c ) 1 Φ ( X j t ( c ) ) .Math. H 2 , ( 5 )

[0077] where H represents the reproducing kernel Hilbert space, and Φ(⋅) represents a function of feature space mapping; x.sub.i.sup.s(c) represents the i.sup.th sample in samples with a label of c in the source domain, n.sub.s.sup.(c) is equal to the quantity of all samples with the label of c in the source domain, x.sub.j.sup.t(c) represents the j.sup.th sample in samples with a pseudo label of c in the target domain, and n.sub.t.sup.(c) is equal to a quantity of all samples with the pseudo label of c in the target domain; the foregoing expression is used for estimating a difference between intra-class condition distributions P.sub.s(x.sub.s|y.sub.s=c) and P.sub.t(x.sub.t|y.sub.t=c); the conditional distribution difference between the source domain and the target domain can be reduced by minimizing the foregoing expression; and because the high-level features are extracted by the plurality of substructures at the same time, a total conditional distribution distance is as follows:

[00014] d ( X s , X t ) = .Math. i = 0 n - 1 d ( g i ( X s ) , g i ( X t ) ) , ( 6 )

[0078] where g.sub.i(x) is the output of one substructure, i∈{0,1,2, . . . ,n−1}, and n is the total quantity of substructures in the feature extractor.

[0079] Further, Step 4 specifically includes the following steps:

[0080] Step 4.1: calculating classification loss of samples in the source domain, where a process of the calculation is shown by the following formula:

[00015] loss classifier ( y , X ) = 1 n .Math. i = 0 n - 1 J ( y i , F ( x i ) ) , ( 7 )

[0081] where X represents a set of all samples in the source domain, and y represents a set of actual labels of all the samples in the source domain; n is the quantity of samples that participate in training, y.sub.i is the actual label of the i.sup.th sample, and F(x.sub.i) is the predicted result of the i.sup.th sample by a neural network; and J(⋅,⋅) represents the cross entropy loss function, and is defined as follows:

[00016] J ( p , q ) = - .Math. 1 = 0 C - 1 p i log ( q i ) , ( 8 )

[0082] where when i is the actual label of the sample, p.sub.i is equal to 1, or otherwise, p.sub.i is equal to 0; q.sub.i is a probability outputted after the softmax activation function; and C is the total quantity of label classes; and

[0083] Step 4.2: combining the conditional distribution distance and the classification loss of the samples in the source domain to form a multi-scale high-level feature alignment target function to be optimized, where a formula of the target function is as follows:

[00017] l total = min l classifier ( y s , X s ) + λ .Math. i = 0 n sub - 1 d ( g i ( X s ) , g i ( X t ) ) = min F 1 n s .Math. i = 0 n s - 1 J ( y i s , F ( x i s ) ) + λ .Math. i = 0 n sub - 1 d H ( g i ( X s ) , g i ( X t ) ) , ( 9 )

[0084] where F(⋅) represents a model output function, g(⋅) represents an output of one substructure in the multi-scale feature extractor, J(⋅,⋅) represents the cross entropy loss function, λ>0 is a superparameter, n.sub.sub is equal to a quantity of substructures in the multi-scale feature extractor, and d.sub.H(⋅,⋅) is the conditional distribution distance; the foregoing expression may be used to enable the network F(⋅) provided from training to accurately predict a label of a sample from the target domain; and the superparameter λ in the foregoing expression is set as follows:

[00018] λ = 2 - 10 epoch 1 + e epochs 1 , ( 10 )

[0085] where epochs is the total quantity of times of training, and epoch is the current quantity of training steps.

[0086] Step 4.3: minimizing the multi-scale high-level feature alignment target function by using the SGD method, updating the parameters of the model, and training the model:

[00019] θ i < ? θ i / a θ i l £ + , ( 11 )

[0087] where θ is all the parameters of the model, and θ.sub.i represents the i.sup.th parameter; and l(θ) represents a target function related to the parameter θ, and α is a learning rate, that is, a step size.

[0088] Further, Step 5 specifically includes the following steps:

[0089] feeding the unlabeled samples in the target domain into a trained multi-scale deep intra-class adaptation convolutional neural network model to obtain predicted labels of all samples in the target domain, and performing comparison with labels that are manually marked in advance but do not participate in the training process, to obtain diagnosis accuracy to verify the high quality of the model, where a formula for calculating the diagnosis accuracy is as follows:

[00020] acc = 1 n t .Math. i = 0 n t - 1 sign ( F ( ( X i t ) = y i t ) , ( 12 )

[0090] where sign( ) represents an indication function, y.sub.i.sup.t is the actual label of the i.sup.th sample in the target domain, F(x.sub.i.sup.t) is the predicted result of the i.sup.th sample in the target domain by the model, and n.sub.t is the total quantity of samples in the target domain. Diagnosis results under 12 variable working conditions are shown in FIG. 7. As can be seen from the diagnosis results, the average accuracy of diagnosis tasks under the 12 variable working conditions reaches 99.10%, and a standard deviation is 0.0080. It indicates that based on the relatively high diagnosis accuracy achieved in the present invention, the universality and generalization of the deep learning model are greatly improved, and the impact of domain shift on a fault diagnosis method based on conventional deep learning under variable working conditions can be adequately handled.

[0091] In summary, a multi-scale convolutional intra-class adaptation fault diagnosis model is designed based on a convolutional neural network and a transfer learning algorithm in the present invention. Compared with conventional deep learning methods, the present invention can better mitigate adverse impact of domain shift on a deep learning model, better conform to actual scenarios of industrial applications, and meet the requirement of fault diagnosis under variable working conditions.

[0092] The foregoing descriptions are only preferred implementations of the present invention, but are not used to limit the present invention. It should be noted that for a person of ordinary skill in the art, several improvements and variations may further be made without departing from the technical principle of the present invention. These improvements and variations should also be deemed as falling within the protection scope of the present invention.