Font Size: a A A

Gene Expression Data Classification Based On Boosting

Posted on:2020-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiangFull Text:PDF
GTID:2370330602951044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gene expression level can be measured using DNA microarray technology,resulting in gene expression data.,Effective information can be mined by analyzing and studying gene expression data,which is helpful for pathological analysis and disease diagnosis.Using gene expression data for cancer classification diagnosis is an important method of cancer detection.However,traditional pattern recognition method is prone to fall into the "dimensionality disaster" due to the characteristics of high dimensionality and small number of samples in gene expression data.Boosting is an integrated learning algorithm.Arbitrary classification algorithm can be integrated as the base classification algorithm.Therefore,the accuracy of classification algorithm can be improved.Stack auto encoder is a kind of deep learning method,which can learn high level features of data through a large number of training samples,so that it has good classification performance in many pattern recognition problems.The classification accuracy of deep learning method for gene expression data is not high due to the small sample size of gene expression data.Boosting is an iterative algorithm and training samples is different on each round,which can compensate for the problem of insufficient sample to some extent.Therefore,this paper proposes a method combining stack automatic encoder and Boosting method for classification of gene expression data.This algorithm firstly use principal component analysis of gene expression data dimension reduction,then stack auto encoder will be base classification algorithm of Boosting,finally combining multiple stack auto encoder to make decisions.It was found that the algorithm improves the classification accuracy of the stack will be automatically encoder 5% ~ 10% over 7 groups of real gene expression data.Comparing with support vector machine(SVM)and random forest algorithm,it has higher classification accuracy.This algorithm can significantly improve the classification accuracy of stack auto encoder,having a good classification performance.For the same training samples,the fitness of different algorithms and learning performance is different.How to use more than one model to generate the base classifier to get a better base classifier combination is the key to Boosting for integration problems.Due to the diversity and accuracy are the main factors influencing the Boosting algorithm,then dynamically generated base classifier based on to diversity and accuracy.So this paper proposes a dynamic generation of base classifier based on multiple model in Boosting the training of each round.Through different learning algorithms generate multiple classifier model,and then calculate the diversity and accuracy.Classifier of large diversity and high accuracy is selected as the base classifier,making the final combination classifier diversity bigger.It can improve the classification accuracy of the integration system.Support vector machine(SVM)and the decision tree is simple and efficient,therefore the method using different learning algorithm for support vector machine and decision tree.It show that the method can improve the classification accuracy of support vector machine and decision tree through the experiment on 7 groups of real gene expression data.The overall classification accuracy is better than the support vector machine(SVM)homogenous integration and decision tree homogenous integration.
Keywords/Search Tags:Gene Expression Data, Boosting, Stack auto encoder, AdaBoost, Diversity
PDF Full Text Request
Related items