Partial Label Learning Algorithms Based On Metric Learning And Max-loss Function

Posted on:2018-04-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Zhou

Full Text:PDF

GTID:1318330542969131

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the coming of the big data era,weakly supervised learning technologies have become one of the research focuses of machine learning field,and been widely applied in many practi-cal problems of control engineering,system engineering,pattern recognition,and information security,etc.Partial label learning(PL)is a new weakly supervised machine learning frame-work refers to the classification problems where the true label of each training sample cannot be directly observed and the only available information is that the true label is concealed in a set of candidate labels.Because it is an extension of the traditional classification framework by reducing the requirement to the training data set,PL framework has the same wide application fields as the traditional classification framework,and has been applied in image processing,text mining,medical diagnosis,etc.Although PL framework has gradually attracted the attentions of researchers in recent years,the ambiguity in training data inevitably makes this framework much difficult to address and the existing PL algorithms cannot satisfy the requirements of many real-world problems.This dissertation focuses on the development of PL algorithms,and the main research contents are as follows:1.The accuracy of PL algorithm is closely related to the distance metric that involved in its model.A metric learning algorithm is proposed for PL framework by using the geometric mean metric learning model.The basic idea of the proposed algorithm is to take each training sample and its neighbor with shared candidate label as a similarity pair,and to take each training sample and its neighbor without shared candidate label as a dissimilarity pair.Moreover,in order to maintain the useful manifold structure information of the original metric space,a term that can maintain the original positional relationship between each training sample and its k nearest neighbors having shared candidate label is also added in the objective function.The experimental results show that the proposed metric learning algorithm can be used as a front end of the existing Euclidean distance-based PL algorithms to improve their accuracy,especially the accuracy of the k nearest neighbor-based PL algorithms can be improved significantly.2.The max-loss function can be used to more accurately capture the relationship between the partial labeled sample and its candidate labels,but the max-loss function usually would bring us a non-differentiable objective function difficult to be solved.A differentiable max-loss func-tion is presented by introducing the aggregate function to approximate the max(·)function in-volved in the original max-loss function,and based on this new loss function,two new PL al-gorithms are proposed by using Logistic regression model and Gaussian process model.The theoretical analysis and experimental results on these two algorithms show that the algorithms developed by using the new max-loss function,can achieve superior accuracy than the algorithm-s developed by using the average-loss function,and their objective function is a differentiable concave function easy to be solved.3.In order to reduce the computational complexity of PL algorithm,two fast kernel-based PL algorithms are proposed by using sparse Gaussian process models.The basic idea of the first algorithm is to convert the original PL training data set into several standard two-class data sets by using ECOC technology firstly,and then develop a binary classifying with lower computational complexity on each two-class data set by using variational Gaussian process model.The second algorithm is a modified one of the algorithm proposed in the above section by using the max-loss function and Gaussian process model,the main idea is to select a small subset U of the training set by using a fast clustering algorithm to define a set of inducing variables FU firstly,then the posterior distribution of the values of latent function on training set can be analytically deduced by marginalizing the inducing variables FU,whose posterior distribution can be computed by maximizing a lower bound of the log marginal likelihood using Laplace method with a lower computational cost.The proposed two PL algorithms have superior accuracy than the existing PL algorithms,and their computational complexity can be reduced from O(n3)to O(nm2).

Keywords/Search Tags:

Partial Label Learning, Metric Learning, Gaussian Process Model, Max-loss Function

PDF Full Text Request

Related items

1	Research On Partial Label Loss Function
2	Adaptive Loss Function And Label Smoothing Based On Prototype Network For Few-shot Learning
3	PP-PLL:probability Propagation For Partial Label Learning
4	Research On The Utilization Techniques Of Partial Label Data
5	Partial Label Learning Algorithm Based On Label Correlation And Partial Label Dimensionality Reduction With Regularization
6	Research And Application Of Machine Learning Algorithms Based On Gaussian Process Model
7	Ensemble Learning Based Partial Label Learning Algorithm
8	A Research Of Multi-label Metric Learning Algorithm
9	Research On Deep Metric Learning Method Based On Multi-Similarity Loss
10	Research On Novel Partial Label Learning Algorithms