Font Size: a A A

The Research On Gene Selection Based Shrinkage Feature Selection Algorithm For Cancer Classification

Posted on:2016-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:L HuangFull Text:PDF
GTID:2404330473464855Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Tumor is one o f the main causes o f threat to human life nowadays,the prevent ion and treat ment o f tumor is the key focus to all researchers.Wit h the development o f bio-information technology,gene chip technology is widely applied in diagnosis o f disease because o f its high flux and miniaturizat ion.In addit ion,extract ing disease gene fro m gene expressio n pro file has become a new method in cancer treat ment.Feature selection is a very effect ive way in Data Mining.It can eliminate irrelevant and redundant features,and select the most relevant concern phenotypic subset o f features.But gene expressio n data has the characterist ics o f high dimensio n,hig h no ise and high redundancy,which great ly reduce the performance o f feature select ion method,and often leads to greatly increase in co mplexit y.Therefore,it is ver y important to design a kind o f effect ive feature select ion method.In the article,public gene expressio n data are chosen as the subject in experiment,and classificat ion accuracy is one of indicators for performance assessment in method.Our work focuses on feature select ion in gene expressio n data and the main contribut ions in art icle are described as fo llows:(1)Since tradit ional methods fail to consider the correlat ion between data or consider too much caused many problems,such as unreasonable explanat ion,hig h redundancy and low accuracy.The paper proposed a least abso lute shrinkage algorithm based on weighted co-expressio n module(MLASSO),and applied it to the ident ification and diagnosis for tumor.The basic idea o f this method can be described as follows.First ly,building dissimilarit y topology matrix according to correlat ion o f gene,and then construct ing the module by the matrix.Secondly,co mput ing the correlation coefficient between module's eigenvalue and cancer phenot ype,and then select ing important module.Finally,cancer related gene subset can be got after applied the LASSO to important module.Our experimental result demonstrates the MLASSO can not only improves classificat ion accuracy but also reduces redundancy.(2)A similar group based LASSO(SGLASSO)method is proposed to balance the correlation between gene and extensiveness.The SGLASSO can so lves so me problems always puzzled tradit ional methods,such as overfitt ing,local opt imum,poor extensiveness and unsat isfactory accuracy.The SGLASSO selects tumor related modules,and then modules can be ranked by gene significance.Next,the gene whose connect ivit y value is high in module can be selected,and named it as t ypical gene.Finally,gene subset will be got by using iterat ive similar group construct ing and iterat ive LASSO regressio n.The experimental result shows SGLASSO can not only overco mes informat ion losing barriers in transit ional method but also achieves good performance in classificat ion accuracy,stability and generalization.
Keywords/Search Tags:Gene expression profile, Feature selection, Weighted co-expression module, Least absolute shrinkage, Similar group
PDF Full Text Request
Related items