Research On Denoising Collaborative Training Based On Difference And Confidence

Posted on:2022-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y Jiang

Full Text:PDF

GTID:2507306557464374

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Nowadays,there are a lot of work related to labeling in the industry.Obviously,there are a lot of unlabeled sample data in the massive data,which indicates that it is easy to collect a lot of unlabeled data,but it’s difficult to collect labeled data.Collaborative training,as a branch of semi-supervised learning,its biggest advantage is that it can make full use of generous unlabeled samples assisted a few labeled samples to improve the performance of classifier.It has been widely applied in many fields,such as natural language processing,text classification and image retrieval.However,collaborative training has many problems,like low accuracy,single classifier and low algorithm efficiency.The paper proposes several methods for collaborative training and verifies the feasibility and effectiveness of methods through experiments.The work is as follows:(1)In view of the problem that there are noisy data in the initial samples of collaborative training,which weakens the accuracy of the initial classifier,a noise filtering method based on adaptive DBSCAN is proposed.The algorithm obtains optimal parameters through the Silhouette Coefficient to eliminate the noise points.The results show that compared with the traditional collaborative training,the classification accuracy of the collaborative training with adaptive DBSCAN denoising is improved by 3.4% on average.(2)Aiming at the problem that the classifiers are unitary,which leads to the error of tags,a difference measurement method based on weighted inconsistency is proposed.The algorithm introduces the idea of weighted distance and takes into account the difference caused by the error of tags in multi-classification data sets.Compared with the traditional difference measure,the proposed method is proved to be effective and improve the efficiency.(3)To solve another problem that it appears new noisy data after collaborative training,which affects the execution efficiency of algorithm,a similarity measure based on committee is proposed to measure confidence.The algorithm is based on Gaussian function,measuring similarity by KNN distance.Then consider the relationship between samples,similarity can be weighted by representativeness.Finally,Combining the voting method of the learner to measure confidence.In order to assess the performance of the proposed algorithm,experiments on UCI and kaggle datasets are conducted to compare the proposed algorithm with NBST and MCM.The results show that the proposed algorithm can improve the accuracy of collaborative training.

Keywords/Search Tags:

Semi-supervised Learning, Collaborative Training, Adaptive DBSCAN, Diversity Measure, Confidence Measure

PDF Full Text Request

Related items

1	Validation of an interactive measure of adaptive functioning as a supplement to current interview-based methods of assessment of adaptive behaviors in individuals with mild to moderate mental retardation
2	Applied Research On Employment Of University Students Based On Semi-supervised Learning
3	Research On Application Of Employment Guidance For College Graduates Based On Improved Semi-Supervised Self-Training Method
4	Semi-supervised Classification Research Based On Self-paced Learning And Sparse Self-expression
5	Research On Measures Of Coherent Variability And Risk
6	Research On Automatic Summary Technology Of Patent Texts Based On Semi-supervised Deep Learning
7	Semi-Supervised Chinese Text Classification Based On Selective Integration
8	A Study On Risk Identification Of P2P Lending Platform Based On Semi-supervised Learning
9	Semi-supervised Clustering Algorithm Based On Single Linkage Clustering
10	Study On Capital Allocation Based On Tail Risk Measure