| With the rapid development of network and information technology,the ability of data acquisition and storage has been greatly improved.Traditional machine learning models based on labeled data no longer meet the needs of data analysis and processing due to the extremely massive data.Semi-supervised learning has better classification effect,which uses a small amount of labeled data and a large amount of unlabeled data.It allows machine learning models to discover deeper information form the massive and complicating structured data.The problems of weak generalization ability of supervised learning models and inaccuracy of unsupervised learning models are solved.In recent years,semi-supervised learning has become one of the worthwhile research directions in the field of machine learning.Graph-based semi-supervised classification is an important branch of semi-supervised learning.It makes full use of the relationship among data.And it has solid theoretical foundation and explicit objective function,so that it has good performance and is easy to be solved.It consists of two steps:(1)use labeled data and unlabeled data to construct graphs that can express the intrinsic data structure;(2)use graph inference algorithms to deduce labeles of unlabeled data.Therefore,the focus of this dissertationis to effectively explore new graph construction methods and extended graph inference algorithms.On this basis,new graph-based semi-supervised classification methods are advanced.Meanwhile,large-scale dataset classification is researched,and graph-based semi-supervised classification methods for big data are designed,so as to expand the application fields of graph-based semi-supervised classification.The main contributions of this dissertationare summarized as follows:1.A graph based semi-supervised classification algorithm is proposed based on probabilistic nearest neighbor(PNN).The l2 norm based objective function of PNN directly models the probability of becoming neighbors between nodes,and the nodes conforming to local clustering constraint tends to becoming neighbors.The optimization objective function is solved to calculate optimized PNN matrix by incorporating the characteristics of classification tasks.The PNN matrix directly serves as a probability transfer matrix for subsequent label propagation.This algorithm simplifies the computational procedure of probability transfer matrix.It enhances the correlation between graph construction and graph inference.The difference of values in probability transfer matrix is increased,which makes it more applicable for subsequent classification task.Meanwhile,the number of neighbors is determined adaptively based on the preset number.Through theoretical analysis and related experiments,it proves that the proposed algorithm has low time complexity and can better adapt to classification tasks.2.A semi-supervised classification method based on dynamic construction of graph is proposed.A non-parameter edge selection algorithm is proposed,which can capture the distribution of data.More edges are added in data concentrated area,otherwise fewer edges are added.In edge weight calculation process,the distance measurement and the distribution of data are comprehensively considered.In view of the fact that the degrees of nodes in a graph are quite different,an adaptive degree weighting algorithm is proposed.The proposed algorithm is compared with the classical graph construction algorithms.Experiments on the synthetic datasets and image datasets show the effectiveness of the proposed algorithm.3.A graph construction algorithm based on structure similarity and extended label propagation algorithm are proposed.On this basis,a graph based semi-supervised classification algorithm combining local and global features is proposed.Because label propagation algorithm does not reflect the correlation of category information between labeled data,extended label propagation algorithm is proposed.The experimental results show that the proposed classification algorithm improves the classification accuracy.The time complexity of the proposed algorithm is analyzed and verified with the theoretical analysis and experimental results.4.Enhanced semi-supervised classification framework with anchor graph is proposed.The anchor-based graph construction algorithm extends the traditional graph-based semi-supervised classification algorithms and expands the scale of data processing.The core of the algorithm framework is to construct a Z matrix that expresses bipartite graphs.The definition of Z matrix directly affects the classification results.Anchor based probabilistic nearest neighbors algorithm for construting Z matrix is proposed.Meanwhile,aiming at anchor selection,anchor selection criteria are proposed according to different application scenarios.In the application of pixel-level classification and image set classification,SLIC and KMeans algorithms are designed to select anchor points respectively.The proposed algorithm framework has lower time complexity.Experiments on large-scale pixel-level classification and large-scale image sets classification prove the effectiveness and efficiency of the proposed algorithm framework. |