Font Size: a A A

Identification Multi-cancer Risk Module Via An Ensemble Feature Selection Algorithm

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhangFull Text:PDF
GTID:2404330602482249Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues.However,different cancers may share common mechanisms.Identifying genes that are involved in the development and progression of cancer is critical,and joint analysis of multiple cancers may help to discover overlapping mechanisms between different cancers.At the same time,gene expression data are usually with low sample size and high dimension.Traditional feature selection algorithms cannot efficiently handle this kind of data,and cancer is usually caused by only a few key gene mutations,so it is important to perform accurate screening.Here we proposed a fusion feature selection framework belongs to ensemble method.Then we used this method to identify powerful and reliable features in clinically relevant prediction tasks.A joint analysis of 11 human cancers types was conducted to explore the key feature genes of cancer Because of this,we focus on the selection algorithm of informative genes in this paper.The main work of this thesis are as follows:1.Based on the mechanism of filter method,we proposed a new gene selection method-FSGBDT algorithm,which combines filter method and embedded method,genes are weighted according to their discriminant ability.This method is divided into two steps.First,the filtering method is used to conduct large-scale screening of gene expression data,and then a more accurate subset search is performed according to the embedded feature selection algorithm.Experiments show that the method is efficient,simple and easy to expand2.Combining the FSGBDT algorithm with random forest,support vector machine,logistic regression,and naive Bayes classifier to perform experiments on different data sets,compared with three popular gene selection methods,the results show that our method has achieved similar or better the result of.Especially on the breast cancer(GSE5764)dataset with low classification accuracy,the average accuracy of the proposed method can reach 91.5%,which is significantly higher than other methods,which proves that our method is effective.3.Subsequently,we performed gene ontology analysis and literature verification on the selected functional modules.These functional modules were divided into several functional sets:cell growth regulation,innate immune response regulation,tissue migration,interferon production,and specific cancer signals path.In addition,the functional module may be used as a feature subset as a diagnostic criterion for disease.Using the functional module to replace individual genes can help to better study the generation and development of cancer.
Keywords/Search Tags:Gene expression data, cancer classification, feature selection, decision support system
PDF Full Text Request
Related items