Font Size: a A A

Research On Pipeline Fault Classification Method For Incomplete Data Set

Posted on:2024-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2552307112950359Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The further advancement of modern intelligence has led to important achievements in machine learning methods,especially the application of supervised learning models is very widespread.However,the training process of supervised learning models can be affected by its limited number of labeled training samples.In practical,due to the long-tailed distribution of real data and the high human and computational resources,some category samples or labels may be missing,which seriously limits the application of supervised learning models.Facing the real challenges of incomplete data sets,the proposal of small sample learning and zero sample learning provides a new way to solve such problems.which principle lies in fully exploiting the characteristics of existing data sets,so that the models can build machine learning models under the limitation of lack of data or labels.Based on the above background,the text starts from the scenario based on few samples,gradually relax the requirements for data and labels,and finally reaches the zero sample scenario,which solves the problem of classification model establishment under the restriction of incomplete datasets.The proposed method is validated using a sample set of pipeline faults obtained from laboratory collection in the context of industrial fault diagnosis applications under the scenario of few samples and zero samples.The work in this paper consists of three main aspects:(1)The accuracy and the effectiveness of fault detection can be affected by its limited number of pipeline condition labeled training samples which would consume a significant amount of manpower,we propose an improved optimum-path forest(OPF)algorithm can be applied in semi-supervised classification process limited number of faulty samples for pipeline blockage detection.Which uses the Marxian distance metric instead of the Euclidean distance metric to build an optimum-path forest,the complete graph for each class are generated based on the training datasets,and each sample is assigned to its most closely connected minimum spanning tree,and the connectivity between samples in the feature space is used to propagate labels to finally achieve the identification of samples to be classified.The proposed algorithm achieves better classification results for unlabeled proportions of 50%,70%,and 90%,and improves the recognition accuracy by 17.36%、29.13% and 21.4%compared to the KNN semi-supervised classification model and the semisupervised support vector machine classification model.(2)Further relaxing the requirements on data and labels,aiming at the problem of the random selection of unlabeled data during the training process of semi-supervised classification models based on optimum-path forest does not fully exploit the potential information contained in the samples in the case of only 10% of labeled samples,this paper puts forward a semi-supervised classification model of combining active learning(AL)and the optimum-path forest(OPF),Since the active learning mode of sampling one by one will affect the execution speed and efficiency of the algorithm,the sorted batch active learning based on edge sampling and cosine similarity criteria is used to automatically expand the annotated sample set,and then semi-supervised label propagation is achieved by constructing the optimum-path forest.Finally,the experimental verification was carried out using laboratory collected pipe condition datasets.The experimental results show that the method can achieve an overall recognition accuracy of 96.68% when the number of labeled samples is 10%.Compared with active learning methods in one-by-one sampling mode and semi-supervised methods that extract global structural information of training samples based on distance metrics,the proposed method has higher Recall value and F1-score value.(3)In view of the problem that some category samples and labels are missing at the same time in the zero-sample scenario,which leads to insufficient feature learning and thus unable to identify new categories,a zerosample pipeline fault classification and recognition method based on attribute description is proposed.This method uses the human-defined attribute description as auxiliary information under the framework of zero-sample learning,and can achieve classification and recognition of invisible categories through attribute learning of visible category faults.The experimental results show that the average recognition accuracy of this method is 76.5% without the training of new category samples and labels.
Keywords/Search Tags:semi-supervision, active learning, optimum-path forest, zero-shot learning, attribute description
PDF Full Text Request
Related items