Font Size: a A A

Clustering Based Methods For Imbalanced Fault Classification In Industrial Process

Posted on:2020-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:G C ChenFull Text:PDF
GTID:2428330572482982Subject:Industrial process monitoring
Abstract/Summary:PDF Full Text Request
With the high speed development of modern industrial technology,the industrial processes are becoming more and more complex and the requirements of monitoring various parameters of these processes becomes more and more urgent.At the same time,a large amount of process data can be collected because of the wide use of Distributed Control System(DCS).As a result,the data-driven process monitoring and multivariate statistical analysis have drawn a lot of attention of researchers.In past decades,many research results and applications appeared in this field.However,there also many drawbacks and challenges that exist in the development of data-driven process monitoring.Take the fault classification for an example,the traditional classification algorithms are based on the consumption that there is little different between the number of samples different classes.So when facing an imbalanced dataset,the performance of classifier is not always satisfactory.As a result,a new clustering based classification framework to deal with imbalanced fault classification problems in industrial processes and three algorithms are come up with for different types of data.The main contents of research are as follows:(1)Combine K-means with Bayes to deal with relative and absolute scarcity for normal dataset.The first step is to divide the majority dataset into N sub-classes according to the degree of the imbalance with the K-means method.Next,these are combined with M minority classes to form a training set that contains(M+N)classes.The next step is to use a Naive Bayes classification algorithm to train a model on this set.This K-means Bayes method is easy to understand and implement for the engineers in factories.The experiments prove that this method could reduce the impacts of imbalance and performs better than oversampling and undersampling methods.(2)For the imbalanced classification problems in big data field,the K-means Bayes method is extended to MapReduce platform.During the clustering process,the T-threshold K-means is combined with MapReduce framework in case of bringing in new imbalance.The experiments proves that this method achieves a 30%increase in accuracy above the conventional Bayes method in the Hadoop platform.It also satisfies the time requirement.At the same time we find that with the increase of N,the precision for minority would increase while the precision for majority would decrease,which could help for the selection of N.(3)When facing with dataset with high nonlinear,a SVM-tree classification algorithm is proposed in this paper,which divides the majority into N sub-classes and finds the best hyperplane step by step with the help of K-means.This method keeps all classes unbiased during the training process and could help to describe the boundary of the minority.The experiments proved that this method could gain higher precision compared with undersampling and oversampling methods.What's more,a SVM-forest sensitive data selection method was proposed to deal with extreme imbalance,which chooses the samples of majority based on their correlation with the performance of classifier.The experiments show that the SVM-forest based SVM-tree have higher precision than single SVM-tree.Finally,research results are concluded and the future work is discussed.
Keywords/Search Tags:Data-driven Process Monitoring, Fault Classification, Imbalanced Data
PDF Full Text Request
Related items