Font Size: a A A

Research And Visualization Of Dimension Reduction Method Of Gene Expression Data Based On Principal Component Analysis

Posted on:2020-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhaoFull Text:PDF
GTID:2370330590973881Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
"The 21 st century is the century of biotechnology".With the development of the times,gene sequencing and other bioinformatics technologies are changing the world.The technology of gene expression data collection and processing has become more and more mature and diverse.The scale and complexity of massive data are increasing.For example,genome database,nucleic acid and protein structure sequence database,biological macromolecule spatial structure information and so on.However,with the advent of the "big data era",problems also follow.Such large-scale data brings more challenges such as high-dimensional,massive and incomplete.Innovation and Research on dimensionality reduction technology of gene expression profile data are constantly updated and iterated,and considerable results have been achieved.As a classical algorithm,principal component regression analysis has been widely used,many improved and varied algorithms have been developed for different fields.But they are mainly aimed at optimizing and improving the sample data itself,it was not combined with classification labels.This can easily lead to loss of interested target information.Potential internal structural relationships of data cannot be well identified,and then affect prediction and classification performance.Supervised Principal Component Regression and y-aware based Principal Component Regression were studied to verify its superiority over PCR method.Through experiments,it was found that the classification effect of SPCR gradually deteriorated with the increase of the number of retai ned principal components,while that of y-aware PCR was just the opposite.When the number of retained principal components was small,the classification accuracy was slightly worse than that of SPCR.However,after the number of retained principal components was about 35,the classification accuracy was significantly better than that of SPCR.In view of this situation,a weighted fusion(y-spcr)algorithm based on SPCR and y-aware PCR was proposed.Finally,the algorithm is applied to four different high dimensional genetic data for reduction and classification.The experimental results show that,in terms of classification accuracy,Y-SPCR method effectively overcomes the drawback of the above two methods respectively,under different characteristic number algorithm performance is stable,the average accuracy rate reached 82%,relative to conventional polymerase chain reaction(PCR)to the average accuracy improved about 13%,relative to the SPCR with Y-aware polymerase chain reaction(PCR)to the average accuracy increases about 5%,classification effect is ideal.Finally,the experiment visualizes the result of dimensionality reduction of genetic data,and displays the spatial structure of dimensionality reduction data more visually through the friendly front-end interface.To help people observe the underlying structural relationships in high-dimensional data samples in a more flexible and diverse way.
Keywords/Search Tags:Data dimension reduction, Principal component regression, Classifi cation, visualization
PDF Full Text Request
Related items