| With the rapid development of China’s industry,a large amount of industrial data has been generated.The current important application is equipment fault diagnosis based on the classification results of industrial big data.However,real industrial big data suffers from the problem of data imbalance,where the collected fault data is much less than normal data,leading to misclassification of fault data by mainstream classification algorithms.Equipment fault diagnosis requires avoiding misclassification of fault data as much as possible,as a single misclassification can result in significant losses in industry.This article focuses on vibration signal data from rotating machinery in industrial big data,which often requires feature extraction and other processing before classification.This article addresses three issues in industrial big data,namely data imbalance,inability to directly classify signal data,and multiple categories of faults,and achieves the following research results:To address the issue of imbalanced data,this article proposes an improved oversampling algorithm,SK-SMOTE.The algorithm first synthesizes a portion of the minority samples to increase their number,and then assigns weights based on the categories and distances of the neighboring samples of the synthesized minority samples.The sample is then retained if the sum of the weights exceeds a set value.After increasing the number of minority samples,the KMeans algorithm is used for clustering,and clusters with more minority samples are retained.Oversampling is performed within the clusters,and relatively sparse clusters will synthesize more minority samples.To address the two difficulties of the inability to directly classify vibration signal data from rotating machinery and data imbalance in industrial big data,this article proposes a model that combines wavelet packet transformation with the SK-SMOTE algorithm,named the WPT-SK model,and applies it to equipment fault diagnosis.The model decomposes and reconstructs the signal using wavelet packets,extracts the frequency domain features of the reconstructed signal as a feature vector,balances the feature dataset using the SK-SMOTE algorithm,trains a classifier,and diagnoses equipment faults based on the classification results.To address the difficulty of multiple categories of faults in rotating machinery,this article proposes the STFT-LSGAN-Res Net method for equipment fault diagnosis.The method uses short-time Fourier transform(STFT)to convert one-dimensional time series signals into two-dimensional feature images,uses LSGAN to generate feature images similar to fault classes to balance the dataset,trains Res Net using the balanced image dataset for multi-classification,and performs fault classification based on the classification results.The experiments on publicly available imbalanced datasets show that the SK-SMOTE algorithm can effectively balance imbalanced datasets,and its performance is 0.5%-12%higher than that of other three oversampling algorithms.The experiments on public industrial datasets demonstrate that the WPT-SK model can effectively improve the data imbalance problem of industrial big data,enhance the classification performance of fault samples,and SK-SMOTE outperforms the other three oversampling algorithms by3%-20% on datasets with high imbalance ratio.The experiments on public industrial datasets also show that the STFT-LSGAN-Res Net diagnosis method significantly improves the classification performance of fault samples,and on datasets with high imbalance ratio,LSGAN performs 10%-40% better than DCGAN and the overall accuracy slightly increases.Furthermore,it can effectively classify different fault types. |