Font Size: a A A

Research On Active Learning Algorithm Based On Random Walk Sorting And Clusterin

Posted on:2021-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:S Y DengFull Text:PDF
GTID:2568306905451384Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Classification is one of the most important research tasks in the fields of machine learning and data mining,which has been paid close attention by the academia and the industry,and has been developing rapidly.Especially with the arrival of big data of the era,a large amount of unlabeled data is emerging in all walks of life,which brings unprecedented challenges to classification research.The classification performance of traditional supervised learning algorithm depends on the validity of data to a great extent.In general,the quantity and accuracy of marked data directly affect the performance of the training model.However,it is time-consuming,expensive and difficult to label the massive data one by one.Therefore,active learning algorithm is proposed and developed rapidly.With the continuous improvement of active learning technology,a large number of related algorithms have been proposed,especially the active learning algorithm based on pre-clustering,which has become the hot point of current research.This thesis proposes two active learning methods according to the clustering of random walks and focuses on the impact of different similarity on its classification performance.At the same time,according to the random walk model,a measurement method of sample information is proposed,which provides a feasible scheme for active learning sample selection.First,this thesis proposes a pre-clustering active learning method(ALCR)based on the clustering of random walks,and propose four sample selection strategies based on neighborhood,representativeness,and uncertainty.Meanwhile,the influence of six different similarity calculation methods on ALCR classification performance was compared among Manhattan,Euclidean,Mahalanobis,Cosine,Pearson and Chebyshev.Experiments show that the ALCR based on neighborhood sample selection strategy can get better results,and the similarity of Manhattan,Euclidean and Cosine is more suitable for ALCR.Then,in terms of the problem of excessive sample learning in the active learning stage of ALCR algorithm,this thesis propose an active learning method based on two-stage clustering(ALTC).Compared with ALCR,ALTC uses k-Means clustering again on the result of clustering of random walks.Experiments show that the ALCR algorithm using two-stage clustering mechanism can improve the classification accuracy in most cases.Finally,combining with the PageRank theory,this thesis proposes an active learning algorithm(PAL)to solve the problem of sample sorting,which provides a feasible solution to the problem of active learning sample selection.PAL uses PageRank to rank the samples,which is divided into representative samples and ordinary samples.and according to the similarity of representative samples,a binary tree is constructed to represent the node relationship,which is used to cluster,mark and predict the representative samples.Then the representative samples are used as training sets to classify other samples.Experiments show that the PageRank can effectively measure the representativeness of the sample and plays a key role in the sample selection.
Keywords/Search Tags:Active Learning, Random Works, Classification, Clustering, PageRank
PDF Full Text Request
Related items