Font Size: a A A

Data Dimensionality Reduction And Classification Algorithms

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2180330482982361Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of information technology, huge amounts of data have constantly sprung up, which pushes forward the theory of machine learning. The higher dimension of example data causes the more difficulty of data storage, the larger calculation of data, and in addition, the more occurance of the characteristics of noise or redundancy in the example data. Therefore, how to reduce the dimensionality of high-dimensional data, avoid “the curse of dimensionality” problem and improve the classification accuracy of data has become a hot issue in the field of machine learning.Non-negative matrix factorization(NMF) is a matrix decomposition algorithm. The algorithm constraints all elements to be nonnegative in the matrixs, which include the ones underdecomposed and obtained by factorization. The non-negative constraint of NMF has explicit physical significance and makes it widely concerned as a dimension reduction algorithm. At the same time, semi-supervised learning can combine limited labeled examples data and plenty of unlabeled ones for effectively learning, which overcomes the shortage of labeled examples in supervised learning algorithm and thus improves the accuracy of the classification. Therefore, it is widely used in image classification, text classification and e-mail classification.Firstly, in view of NMF and semi-supervised learning, the thesis proposes a semi-supervised learning algorithm based on non-negative matrix factorization and consistency of learning. Secondly, a novel semi-supervised learning approach based on constrained nonnegative matrix factorization and learning with consistency is proposed by introducing class information in the process of non-negative matrix factorization, which introducing limited labeled examples classification information as constraints in the process of dimensionality reduction, enhanced data dimensionality reduction feature representation capability. Finally, it be introduced the dependencies between classes, and it is proposed about semi-supervised learning algorithm based on the class graph of dimensionality reduction. The algorithm respectively between the examples and examples and between classes and classes create graphs to construct a graph regularizer based on frame, and then it is obtained unlabel samples of labels by solving the Sylvester equation. Experimental results on public data datasets show that the proposed algorithm are both make use of limited labeled examples data and about datasets dimension reduction, and not only can effectively reduce the dimension of the data, but also can improve the classifier generalization ability.
Keywords/Search Tags:machine learning, the curse of dimensionality, dimensionality reduction, semi-supervised learning, Sylvester equation
PDF Full Text Request
Related items