A Robust Deep Learning-Based Fault Diagnosis Method for Rotating Machinery

In the recent years, intelligent data-driven faultdiagnosis methods on gearboxes have been successfully developedand popularly applied in the industries. Currently, most ofthe machine learning techniques require that the training andtesting data are from the same distribution. However, thisassumption is difficult to be met in the real industries, sincethe gearbox operating conditions usually change in practice,which results in significant data distribution gap and diagnosticperformance deteriorations in applying the learned knowledgeon the new conditions. This paper proposes a deep learning-based domain adaptation method to address this issue. Theraw current signals are directly used as the model inputs fordiagnostics, which are easy to collect in the real industries andfacilitate practical applications. The maximum mean discrepancymetric is introduced to the deep neural network, the optimizationof which guarantees the extraction of generalized machineryhealth condition features across different operating conditions.The experiments on a real-world gearbox condition monitoringdataset validate the effectiveness of the proposed method, whichoffers a promising tool for cross-domain diagnosis in the realindustries.


I. INTRODUCTION
In the past decades, rotating machines have been widely used in a large number of industries, such as manufacturing, aero-space industry, automotive etc. Gearbox is one of the key components in rotating machines for delivering torque and offering speed conversions. Effective and timely fault diagnosis of gearbox is of great importance in the real industries, which can optimize maintenance schedule, enhance operational safety and reduce economic costs [1]. Traditionally, many modelbased signal processing methods have been used for the fault signal analysis of gearbox [2]. While effective diagnosis results have been obtained, the model-based approaches generally rely on good expert knowledge, and require much human labor on model development. Therefore, they are less efficient for applications in the real industrial scenarios. Moreover, smart manufacturing initiative has established a consistent method for data access across different enterprises helping predictive manufacturing and fault diagnosis to advance in a rapid pace [3,4]. In general, high diagnosis accuracy and fast implementation can be achieved [5]. Furthermore, little prior expertise on signal processing and dynamics model of gearbox is generally required, which largely facilitates the industrial applications. In the literature, the popular datadriven methods include artificial neural networks (ANN), random forest (RF), support vector machines (SVM) and so forth. Recently, deep learning has been emerging as a highly effective algorithm for data processing, which is promising to further improve the performance of the existing data-driven approaches [6]. Basically, the deep learning methods are capable of efficiently capturing the underlying relationship between input and output data, through multiple linear and non-linear data transformations [7].Specifically, with respect to fault diagnosis problems, the machinery health states can be well predicted using the collected condition monitoring data, despite the high dimensions of thesignals [8,9].Authors in [10] proposed using convolutional neural network (CNN) for gearbox fault diagnosis and achieved a significantly better classification accuracy compared to the classical ma-chine learning methods. A fault diagnosis method for wind turbine gearbox based on stacked auto-encoder and multiclass SVM was proposed in [11]. A Deep Belief Network fault diagnosis method based on manually extracted time and frequency domain features was proposed in3for gearbox and bearing applications. These studies emphasize the significant improvement in gearbox fault diagnosis performance by using deep learning based methods compared to the conventional data-driven methodologies. It should be pointed out that while promising diagnosis performance has been obtained using deep learning, the main assumption lies in that the training and testing data are supposed to be from the same distribution. That means the labeled training data and unlabeled testing data should be collected in the similar operating conditions of gearbox. However, the working scenarios such as load, rotating speeds etc. usually change in different practical industrial tasks. That results in significant distribution discrepancy between training and testing data, which deteriorates the data-driven model generalization performance [12].In order to address this problem, transfer learning algorithms have been proposed in the recent years [ [26] stated that the current signal of the induction motor driving the gearbox is useful for the fault diagnosis investigations, and the motor current signature analysis (MCSA) can be largely improved using the proposed demodulation method. The effectiveness of MCSA in rotating machinery fault diagnosis problems was also validated in [27,28]. Therefore, it is feasible and promising to explore the current signals for gearbox health identification, which are easy to collect in the real industries. However, it should be pointed out that the existing methods are mostly complicated and require sophisticated domain knowledge on gearbox modeling and signal processing skills, which are difficult to be implemented in different applications. This paper proposes a deep learning-based domain adaptation method for the gearbox fault diagnosis. An end-to-end diagnostic framework is built, which takes the raw collected data as input and directly outputs the results. The current signals are investigated in this study, which are generally easier to collect than the popular vibration data in the real industrial scenarios. The maximum mean discrepancy metrics introduced to measure and minimize the data distribution distance between different domains, and the generalized diagnostic features of different machinery health condition scan be extracted. Experiments on real-world gearbox datasets are implemented for validations, and the proposed method is capable of effectively diagnosing gearbox faults across different operating scenarios. The remainder of this paper starts with the preliminaries in Section II. The proposed fault diagnosis method is shown in Section III, and experimentally validated and investigated in Section IV. We close the paper with conclusions in Section V.

A. Deep Convolutional Neural Network
In the past years, deep learning also denoted as deep neural network has achieved great success in different applications. Besides the basic multi-layer perceptron (MLP) structure, the convolutional neural network (CNN) architecture has been more efficient on feature extraction and the high-dimensional machinery data can be well processed [7]. Basically, multiple convolutional layers are stacked in the CNN structure to model the relationship between input and output. Specifically, the onedimensional CNN is adopted in this study, which is well suited to process the measurement signals of gearboxes. Together with convolutional operations, pooling is usually implemented after the convolutional layers. The averagingpooling and max-pooling operations are popularly adopted, which are able to learn the average and maximum values from the local data respectively. In this way, the most significant features can be extracted and the data dimensions can be reduced, which increases the computing efficiency of deep learning. By exploiting the convolutional and pooling operations, the high-level features from raw data can be obtained, and they can be used for the final task afterwards, i.e. machinery fault diagnosis. Readers are referred to [7,29] for more descriptions of CNN.

B. Domain Adaptation
To bridge the gap between different data distributions on machine learning, transfer learning techniques have been successfully developed and widely used in the applications30.Specifically, the domain adaptation method in transfer learning has been receiving increasing attention in the fault diagnosis studies, since the machinery health condition label spaces are usually identical. In general, the domain adaptation approaches aim to learn domaininvariant features from different conditions, that facilitates the fault diagnostic knowledge generalize in different cases [31].In this paper, the maximum mean discrepancy (MMD) metric is adopted, which measures the distance between the distributions of source and target domains. The optimization of MMD is able to achieve domain fusion in the high-level representation sub-space in deep neural networks, and thus extract generalized features for diagnosis15.The MMD metric is defined as the squared distance between the kernel embeddings of data marginal distributions in the reproducing kernel Hilbert space (RKHS) as Where Hk denotes the RKHS endowed with the characteristic kernel k. Based on the current understanding of MMD [32], kernel choice is one of the key factors in domain adaptation, since different kernels can embed the probability distributions indifferent RKHSs and different orders of the statistics are explored. Therefore, multiple kernels in MMD are employed in this paper to leverage different kernels and achieve improved performance. In the implementations, Nk RBF kernels are used as [33], Where kσi denotes a Gaussian kernel with bandwidth coefficient σi. In this study, three kernels are adopted, and the bandwidth parameters are selected as 2, 4 and 8

III. PROPOSED FAULT DIAGNOSIS METHOD
The proposed method is described in Figure 1 and consists of four individual steps. In each step, the key functionalities are presented and discussed in detail

A. Data Partitioning
In the first phase, the raw time-domain sensor data collected from a gearbox is partitioned into two sets (a) source domain data (labeled data) and (b) target domain data (unlabeled data).The target domain data is also further partitioned into training and testing sets, where one of the unlabeled subset is used in training the CNN model and the other subset is used for testing the trained model.

B. Data Modeling
There are two major steps for modeling the data prior to training the diagnosis model, which are presented as follows.1) Data augmentation In order to increase the number of training samples, a windowing method has been used. As depicted in Figure 2,a window with a fixed sample size moves over a time series signal and generates multiple samples. For example, a signal with 1000,000 points can provide the 191 training samples with length 50,000 when the shift size is 5000 points.2) Fast Fourier Transform (FFT)In order to eliminate the impact of the supply line frequency, the FFT technique is applied to each sample generated fromthe augmentation process. It is expected that fault signatures appear as sidebands around the supply line frequency (or running frequency) in the FFT spectrum [34]. All samples after FFT are directly used in the deep learning model for feature learning and fault diagnosis.

C. Deep Learning Model Formulation
For the network optimization, two terms are generally included in the objective, i.e. source-domain classification loss and domain discrepancy loss. First, following the typical machine learning paradigm, the empirical health condition identification errors on the source domain are supposed to be minimized, and the cross-entropy loss function Ls is adopted in this study, which is defined as, Where ns denotes the number of the source-domain training samples. xsi, jis the jth element of network output vector, taking as input the ith labeled source-domain sample, and yiis the label of the ith source-domain sample. Nc represents the number of the concerned machinery health conditions. Besides the basic supervised learning part, the source and target domain discrepancy should be minimized, and the MMD metric is adopted to measure and optimize the domain gap in this study as described in Section II-B. Specifically, the MMD loss Ld is defined as, where PS and PT denote the distributions of the highlevel representations of the source and target-domain data respectively in the last fully-connected layer of the network. In summary, the losses in Equations (3) and (4) can be combined, and the final optimization objective Lopt can be expressed as, the unlabeled testing target-domain data are used for fault diagnosis and performance of the proposed method is reported. Fig. 3: The experimental setup of the test rig [35]

A. Test Rig
A validation study has been conducted on a dataset acquired from a gearbox prognostic simulator (GPS) built by the Spectra Quest Company35, as is shown Figure 3. Two confronted electrical motors are used in the test rig; one motor is used for drive and the other one for resistance/load. Both motors are three-phase induction motors with 10 Hp and two pair of poles. A current sensor (HTA 100) was installed on the drive motor and was used in our analysis for fault diagnosis. The datawas recorded using a computer with a National Instruments acquisition card (NI 4472 series) at a sampling rate of 50ks/sec. The monitored gearbox is composed of four spur gears( Figure  4). The first gear, as it comes from the motor that drives the test bench, has 32 teeth. It is the one substituted by gears in different health states, leaving the rest unchanged. It is followed by a gear with 80 teeth. In the same axle, a gear with 48 teeth is found, connected to a gear with 64 teeth, resulting in a global transmission relationship of 3.33.
In this study, the torque load applied to the gearbox was gradually increased by 40%, 80%, and 100%. In each load, the operational speed was kept constant at 1500 rpm and each run was repeated 15 times to reduce the impact of randomness and uncertainties. Table I summarizes the experimental studies and a comparison between raw motor current measurements and the corresponding FFT spectra for different loads and in healthy condition is given in Figure 5. Accordingly, by increasing the load condition, amplitude of raw current signal and FFT spectrum increase significantly. Figure 6 shows the five health conditions examined in this paper and the impact of FFT analysis in distinguishing different faults at 0% load is given in Figure  7. As shown, raw motor current measurements do not show significant differences between different health conditions, however, they are clearly distinguishable from the FFT spectrum. The proposed method is tested for six transfer tasks, i.e A summary of data segmentation for different tasks is given in Table II. Nsource and Ntarget represent the number of samples from each class of source and target domain datasets respectively. All experiments are performed on a PC with 16-GB RAM, Core i5 CPU, and NVIDIA GeForce TX 2080 Ti. The programming is done in Tensorflow and GPU computing is used to reduce the model training time.

B. Model Architecture Design
As shown in Figure 8, the first step is to design a CNN architecture and tune the network parameters. In this study, a stack of four convolutional and pooling layers and a maxpooling layer are used for model training. The impact of filter size (Fs) and filter number (Nf) on the cross domain diagnosis performance and for task T1−2is shown in Figure 9. Generally, a larger value for Nf and Fs leads to a higher diagnosis accuracy, but this improvement by larger values is relatively limited. Moreover, by increasing Nf and Fs, the training time increases significantly. Therefore, Nf=Fs= 20was selected for the final model. Batch size (Nb) is another tuning parameter that may significantly affect the diagnosis accuracy. For our dataset, selection of low batch size leads to the worst diagnosis results and a too large batch size would create a big cumulative descent in updating the parameters especially when MMD loss is integrated in the model and therefore the prediction

International Journal of Advanced Engineering Research and Science (IJAERS)
[ Vol-7, Issue-7, Jul-2020]  https://dx.doi.org/10.22161/ijaers.77.1  ISSN: 2349-6495(P) | 2456-1908(O) accuracy drops for too large batch sizes. Therefore, it is important to choose a reasonable tradeoff value for Nb. Consequently, Nb= 64 was selected for the final diagnosis model. The confusion matrix corresponding to the final diagnosis results in task T1−2 is illustrated in Figure 10. It is observed that only two classes 'eccentricity' and 'missing tooth' are slightly misclassified and all other classes are precisely classified.

C. Results and comparison
In this section, different implementations are used to evaluate the performance of the proposed method and comparison with the latest related works is also presented.

1) Effects of training sample size
Performance of the final model in different tasks, i.e.T1−2, T1−3, T1−4, T4−3, T4−2and T4−1and for different source domain sample size, N source, is illustrated in Figure11. In this study the number of target samples, N target, is kept constant at 300. With increasing N source, the testing accuracy increases and prediction uncertainty (measured by the standard deviation) reduces significantly. The proposed CNN-based domain adaptation method provides acceptable testing accuracy even with small training source samples, N source. As presented in Figure 11, the achieved testing accuracy in some tasks like T1−2and T4−3is higher than other tasks. This observation is due to the nature of data and the similarity between the distribution of source and target domain. For instance, the load variation from experiment #1 to experiment #2 is 40% which is smaller than that between experiment#1 and experiment #4 (i.e. 100%). Therefore, the transfer of learned features from experiment #1 to experiment #2 is easier. In addition, achieving the high accuracies in different tasks from low to high operational loads and vice versa indicates that the proposed method performs well bidirectional between different domains. The achieved results for different tasks also clearly illustrate the effectiveness of the motor current measurement signal for cross-domain fault diagnosis. As presented, by increasing the number of training samples, the diagnosis performance improves as well which follows the same pattern as the classical fault diagnosis methods.

2) Classification Results and Comparison
Performance of the proposed transfer learning methodology is compared with two groups of fault diagnosis tools as summarized below: Group A-Supervised classification methods such as: 1) LDA [36]-Linear Discriminant Analysis is a supervised algorithm that uses a linear transformation matrix to project features from parametric space to feature space.
2) SVM [37]-Support Vector Machines are supervised machine learning algorithms that can be employed for both regression and classification problems. SVMs are designed based on Structural Risk Minimization criteria in the statistical learning theory. SVMs work on a simple idea: to identify a hyper-plane which separates the training data into two distinct classes.

3) CNN Without Domain Adaptation (No-DA) -A deep
learning method that automatically extracts features from the raw signal measurement. A typical classification is obtained by only considering the classification loss in Equation (5). This trained model is directly used for testing on the target dataset.

4) BDA [41]-Balanced Distribution Adaptation aims to
automatically balance the significance of marginal and conditional distribution discrepancies and therefore it can effectively adjust for a specific transfer task.

5) T-S [42-]
This method suggests performing adaptation by learning a target-specific network from the sourcespecific network. In Group A, three classification methods are used to learner presentative features from the training source data in a supervised process and then the trained classifier is used on the target domain data for testing and the achieved results are reported. Hand-crafted time and frequency domain features such as standard deviation, mean, peak to peak, kurtosis, frequency amplitude and energy etc. are used as an input to LDA and SVM methods. For No-DA, raw frequency- Fig. 9: Impact of filter size and filter number on the testing accuracy for task T1-2.
domain data is utilized. Because these methods inherently do not consider domain variation between the source and target datasets, therefore a low classification performance is highly expected. In Group B, the extracted time and frequency features are used for domain adaption tasks and the achieved results are compared with the proposed method. Analyses are conducted on 300 samples obtained from the source and target dataset and the obtained

International Journal of Advanced Engineering Research and Science (IJAERS)
[ Vol-7, Issue-7, Jul-2020]  https://dx.doi.org/10.22161/ijaers.77.1  ISSN: 2349-6495(P) | 2456-1908(O) www.ijaers.com Page | 8 diagnosis results on the testing (target domain) data are visualized in Figure 12. In contrast with other methods, the proposed approach provides the highest accuracies in all six transfer tasks, and basically, the accuracies are higher than 91%, which illustrates the effectiveness of the proposed transfer learning approach. The average performance improvement for the proposed method is 57.46%, 55.68%, 39.3%, 36.62%, 35.87%, 34.5%, 26.67%,2.75% compared with LDA, SVM, GFK, JDA, TCA, BDA, No-DA, and T-S. The second-best performance is obtained from T-S and No-DA is ranked in the third place. Overall, domain adaptation methods discussed in Group B outperform the classical diagnosis methods in group A but they are not as promising as the proposed method. The performance of different diagnosis methods for the low number of training samples e.g. Nsource= 60 and Ntarget= 300, is illustrated in Figure 13. As expected, using low number of labeled data for training deteriorates the testing diagnosis accuracy for all evaluated methods. This observation is consistent with the previous studies conducted on deep learning methods that larger training data leads to a better diagnosis performance and transfer learning based diagnosis methods also follow this pattern. Moreover, comparing the results obtained from methods in Group A (without domain adaptation) with the diagnosis results obtained from methods in Group B and the proposed method, shows the significant impact of crossdomain adaptation on fault diagnosis performance. T-S which provides an alternative way for domain adaptation, shows good performance with large training sample size. However, with a low sample size, its performance deteriorates significantly because this method minimizes the distribution discrepancy between the target dataset and the learned representations from the source training network. The achieved results illustrate the effectiveness of motor current signal for cross-domain fault diagnosis.

3) Visualization of the learned features
In order to illustrate the effectiveness of our approach, Tdistributed Stochastic Neighbor Embedding (t-SNE) technique is adapted in visualizing the high-level feature representation by mapping them from the original feature space into a 2-Dspace map. The visualization is performed on task T1−2for the proposed method and also for CNN without domain adaptation (No-DA method).

International Journal of Advanced Engineering Research and Science (IJAERS)
[   Figure 14 illustrates the virtualization of learned features in the fully connected layer of the source domain classifier without domain adaptation. As observed, without domain adaptation, samples from each identical class in the source or target data cluster together. However, for some labels there is a notable distribution discrepancy between the source and target domain samples. Since the feature space is divided into several regions associated with different labels, it is expected to obtain a low diagnosis performance in the target domain data. Therefore, it is necessary to bridge the distribution discrepancy between the source and target data to improve classification results on the target data. By using domain adaptation, as is shown in Figure  14, the source and target domain features are projected into the same region as the model is trained. Accordingly, the distribution discrepancy has reduced significantly between the source and target domains and samples from different conditions are separated clearly. These two requirements a) minimal distribution discrepancy between two domains and b) clear differentiation between different health conditions in both domains would guarantee achieving an accurate cross-domain fault diagnosis. As illustrated, the cross-domain invariant features obtained by the proposed method are clustered well where features from different classes are separated clearly and only a small amount of overlapping is observed between classes 'Eccentricity' and' Missing tooth' faults in the source and target domains.

V. CONCLUSION
In this paper, a deep learning-based domain adaptation fault diagnostic method for gearboxes is proposed. An endto-end diagnostic model is established, which takes the raw motor current data as inputs, and directly outputs the predicted health conditions. The maximum mean discrepancy metric is used to bridge the distribution gap between different gearbox operating conditions. Experiments on a real-world gearbox condition monitoring dataset are carried out for validations, and promising crossdomain fault diagnosis performance is achieved by the proposed domain adaptation method. This study offers a new perspective on enhancing fault diagnosis model generalization ability in different operating scenarios of gearboxes. The high data requirement of vibration signals by most existing methods is also alleviated, and effective diagnostic performance can be obtained using only the easily-collected current data. However, it should be pointed out that the main limitation of this study lies in the assumption of the target-domain data during training. Further research works will be carried out on developing robust fault diagnosis models for different scenarios without the availability of the target-domain data in advance.