| In recent years,with the transformation and upgrading of the manufacturing industry,the scale of modern industry production has continued to expand.The associated industrial safety issues have received more and more attention.In this context,fault diagnosis technology has emerged.Fault diagnosis aims to timely and effectively detect abnormal situations in the industrial production process and find the root cause of the anomalies,so as to ensure production safety and product quality.Among the many methods,data-driven fault diagnosis methods do not rely on precise modeling of the production process,but construct fault diagnosis models by analyzing the data generated in the production process.Compared with traditional modelor knowledge-driven method,data-driven methods are more suitable for the complex and everchanging modern industrial production environment,and have become one of the mainstream industrial fault diagnosis methods.However,despite the fact that many data-driven fault diagnosis methods have been proposed,most of these methods require all industrial data to be correctly labeled for training fault diagnosis models.In many practical industrial applications,it is usually difficult to ensure that all training labels are correct due to the complexity of industrial processes and the subjectivity of data labeling.When label noise exists,incorrect labels will mislead the training of fault diagnosis models,resulting in incorrect fault diagnosis results and causing misjudgment or missed detection of abnormal situations.This thesis conducts a series of studies on the industrial fault diagnosis problem when label noise exists in the data.The specific content and innovations are as follows.First,for fault classification scenarios where the training data contains multiple types of fault data,this thesis proposes a COnsistence-based Mislabeled Instances REmoval algorithm(COMIRE).The study in this thesis shows that in the early training stages of deep neural networks,the training loss of clean samples and the network’s uncertainty in predicting them have similar changing trends,while the training loss of mislabeled samples and the network’s uncertainty in predicting them have quite different changing trends.Based on this phenomenon,this thesis designs a consistence-based discriminative index to identify and remove mislabeled samples in the training data.Experimental results show that,compared with the mainstream training loss discriminative index,the proposed consistence index can better distinguish between clean samples and mislabeled samples,especially for distinguishing between hard clean samples and mislabeled samples.Then,for fault classification scenarios under label noise,this thesis proposes a Probabilistic Information-Theoretic Discriminant Analysis algorithm(PITDA)based on the idea of label correction.The algorithm consists of two iterative steps,i.e.,a feature extraction step and a posterior probability calculation step.In the feature extraction step,this thesis proposes a probabilistic mutual information measure function.By optimizing this measure function,the algorithm can use posterior probability information to guide the learning of a robust feature extractor.In the posterior probability calculation step,the algorithm combines noisy label information and clustering information of data in the feature space to achieve robust posterior probability calculation.This posterior probability is used to provide corrected supervisory information for the feature extraction step.Through multiple rounds of iteration of these two steps,the algorithm can significantly reduce the affect of label noise on both feature extraction and classification calculation,thereby obtaining robust fault classification results.Experimental results verify that the proposed algorithm can learn robust fault classification models in the presence of label noise.Next,for fault classification scenarios under label noise,this thesis proposes a Meta-SelfTraining algorithm based on Teacher-Student network(MST-TS)to directly correct the noisy labels in the training dataset.The algorithm trains the teacher network in a self-training manner to generate pseudo-labels for label correction,and uses the corrected labels to train the student network.Based on this,a meta-learning mechanism is designed to feed back the test loss of the student network on a small amount of clean data to the teacher network,which is used to guide the generation of better pseudo-labels.Experimental results show that the proposed meta-selftraining algorithm can effectively alleviate the confirmation bias problem in traditional selftraining methods,thereby obtaining better label correction results.Finally,for fault detection scenarios where the training data does not contain historical fault data,this thesis proposes a robust Expectation One-Class Support Vector Machine algorithm(r EOCSVM),considering the situation that data uncertainty and label noise both exist in the training data.The algorithm uses the integral of the Rescaled Hinge loss of a training sample on its uncertainty distribution as the misclassification loss,which can effectively use the uncertainty distribution information of the data and improve the robustness of the model to noisy data.For such a loss function,this thesis designs an explicit feature mapping method based on vector quantization to simplify the integral calculation in the loss function,and derives a gradient descent optimization algorithm for the objective function.On this basis,considering the affect of label noise on the construction of the feature mapping function,this thesis proposes an iterative mechanism based on sample removal to further reduce the affect of label noise on the entire model.Experimental results verify that the proposed algorithm can effectively handle the uncertainty data fault detection problem in the presence of label noise. |