Font Size: a A A

Research On Graph-based Partial Label Learning Algorithm

Posted on:2020-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:T XieFull Text:PDF
GTID:2370330620453200Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Weak supervised data is more convenient to obtain and cheaper than strongly supervised data.Therefore,how to use weakly supervised information to train classifiers has attracted extensive attention of researchers in machine learning field in recent years.Partial label data is a kind of important weakly supervised data.In partial label data,each sample is represented by a single instance in the feature space and a set of candidate labels in the label space,among which only one is the true label of the instance.For reason that the label of partial labeled data is not accessible,the traditional supervised learning algorithms are unable to process these data.Therefore,researchers have proposed the partial label learning framework to train classifiers using partial label data,and a large number of partial label learning algorithms have been designed.Among these partial label learning algorithms,the graph-based partial label learning algorithm is favored by researchers for that it does not need complex parameter models and has high efficiency.The graph-based partial label learning algorithm includes three key steps:(1)data preprocessing;(2)building graph model according to the k-nearest neighborhood;(3)disambiguating partial label data according to the graph model and training classifier.However,the existing graph-based partial label learning algorithms still have the following problems:(1)The algorithm is based on manifold assumption,which assumes that the nearby samples in feature space should have the same label,and the performance of the algorithm is vulnerable to co-occurrence error-prone samples;(2)In the process of constructing the graph model,existing algorithms only focus on the instances in feature space.Nevertheless,the information hidden in the label space are ignored;(3)The existing methods initialize each candidate label with the same confidence value,and employ the initial confidence matrix constantly in the process of label disambiguation,which lacks the correction of the initial confidence matrix and is vulnerable to the influence of false labels.In view of these problems,the main research content in this paper are as follows:(1)To reduce the influence of co-occurrence error-prone samples,a distance metric learning algorithm for partial label data is proposed,which maps the data into a new feature space during data processing and enlarges the distance between co-occurrence error-prone samples.In this method,the metric matrix is trained by statistical inference and the metric matrix is decomposed into a mapping matrix to map the samples into a new feature space.In particular,the samples in the training set and its neighbor samples are divided into two categories: negative pair of samples consisting of error-prone samples and positive pair of samples consisting of other samples.Each pair of samples is given different weight values.Then,the metric matrix is calculated by using maximum likelihood estimation,a statistical inference method.The mapping matrix is obtained by carrying Chulesky decomposition on the metric matrix.By using the mapping matrix,we can map the data into a new feature space and enlarge the distance between co-occurrence error-prone samples so that to alleviate the influence of error-prone samples.The experimental results on several real-world datasets show that the proposed method can effectivly improve the disambiguation and classification performance of existing graph-based partial label learning algorithms by mapping the data into a new feature space.(2)In order to solve the problem that the existing algorithms focusing only on the relationship between samples in feature space and lacks of utilizing the candidate labels in the process of constructing graph model,a method of candidate-label aware similarity graph for partial label data is proposed.In the process of constructing similarity map,the proposed method synthetically utilizes the information in feature space and label space.Specifically,this method calculates the similarity between candidate label sets by Jaccard distance and linear reconstruction,and then constructs similarity graph based on the nearest neighbor relationship in feature space of the samples and excludes the unreasonable graph edges.The experimental results on several synthetic datasets and real-world datasets show that the proposed method can improve the disambiguation and classification performance of existing graph-based partial label learning algorithms in the process of constructing similarity graph.(3)To solve the problem that the existing methods do not modify the initial confidence and are susceptible to false labels,a partial label learning algorithm based on confidence correction is proposed.The proposed method adopts bidirectional label propagation,which updates the current confidence matrix of the neighbor nodes by forward label propagation and sends the disambiguation results of each neighbor node back to the original node by reverse label propagation so as to update the initial confidence matrix and reduce the influence of false label confidence.The results on the real-world data sets show that the proposed algorithm has better disambiguation and classification performance than the baseline algorithm.
Keywords/Search Tags:Partial Label Data, Partial Label Learning, Graph Model, Statistical Inference, Information Fusion, Label Propagation
PDF Full Text Request
Related items