Font Size: a A A

The Research And Application Of Identification Methods On Cancer Stage Signature Genes

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:R M HuFull Text:PDF
GTID:2404330614965926Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of gene sequencing technology,research on cancer from the genetic level has become a hot topic.This paper focuses on data mining methods for screening T,N,and M stage feature genes of colon cancer,including applying statistical knowledge to the screening of cancer stage feature genes and improving existing screening methods from two different perspectives.The main contents are as follows:Firstly,the Kruskal-Wallis test is used to screen the genes of the T,N,and M stage datasets separately,and the multi-class support vector machine feature recursive elimination(KSVM-RFE)algorithm was combined with the Fisher ratio criterion to comprehensively screen the genes that were initially screened.Finally,the selected feature genes were used to classify the samples.The empirical results show that compared with the preliminary screening using the Kruskal-Wallis test alone,the feature genes screened by this algorithm can achieve better classification results.Secondly,aiming at the problem of redundancy between genes,the principle of minimum redundancy maximum correlation(MRMR)algorithm is introduced into the rank scoring criteria of KSVM-RFE.After preliminary screening of genes by Kruskal-Wallis test,T,N,and M stage feature genes were comprehensively screened using the improved KSVM-RFE and Fisher's ratio criteria,and the samples were classified.The empirical results show that improved KSVM-RFE can screen fewer feature genes and achieve higher classification results.Thirdly,aiming at the problem of redundancy between genes,the K-means clustering algorithm is used to cluster the genes screened by the Kruskal-Wallis test.The genes are selected from each cluster for fusion,then the KSVM-RFE and Fisher ratio criteria is used to further screen the fusion genes.The selected feature genes are used to classify the samples.The empirical results show that compared with the screening method that does not consider gene redundancy,the screening process with the feature clustering idea can select fewer feature genes and achieve better classification results.
Keywords/Search Tags:Support vector machine, Kruskal–Wallis test, Fisher ratio, K-means, Signature genes, Gene expression
PDF Full Text Request
Related items