Font Size: a A A

Research On PCA Technology Of Distributed Heterogeneous Data Sets

Posted on:2020-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2370330590995435Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of modern information network technology,the generated data is more and more diversified.Processing this information helps to describe things,discover objective laws,and effectively improve the development of technology and the utilization of resources.However,in actual production and life,the redundancy and noise of high-dimensional data has become one of the most concerned issues.Although traditional principal component analysis can effectively reduce data dimensions and compressed data,it cannot directly process heterogeneous data.This paper mainly focuses on storage-type heterogeneous high-dimensional small sample data,storage and semantic heterogeneous multi-sample data for analysis and research,and proposes the corresponding principal component analysis method.Principal Component Analysis(PCA)algorithm can classify complex factors into several principal components,simplifying the problem and obtaining more scientific and effective data information.However,the PCA algorithm directly processes high-dimensional small sample data,the algorithm cost is too large,and heterogeneous database integration will produce error components,so this paper proposes a PCA algorithm for high-dimensional small sample data.The algorithm uses the improved principal component analysis algorithm to perform eigenvalue processing to reduce the time complexity.At the same time,the SVD decomposition algorithm is used to optimize the error components generated by the distributed database integration.Then a sparse PCA algorithm for multi-sample heterogeneous data is proposed for multi-sample heterogeneous data.This paper achieves data consistency through heterogeneous feature elimination and isomorphic transformation,and the two-stage PCA algorithm is realized by using the single-mechanism generalized power-sparse PCA method and the unsupervised similarity feature.Finally,in order to simplify the whole operation process,this paper designs a principal component analysis system based on two heterogeneous data forms.The system constructs the modules of data collection,Web service data integration,distributed database management,front and back end control and principal component analysis in detail.The simulation results show that the pca algorithm can extract the principal component load effectively.Among them,the principal component analysis algorithm of high-dimensional small sample has obvious optimization in time efficiency and analysis accuracy compared with classical PCA and other algorithms.The sparse principal component analysis algorithm of multi-source heterogeneous data can also effectively extract principal component loads and obtain more explanatory features.
Keywords/Search Tags:Principal component analysis, Sparse PCA, Feature extraction, Error component, Isomorphic conversion, Data integration
PDF Full Text Request
Related items