Font Size: a A A

Application Of Feature Selection Method Based On Network Structure To Gastric Cancer Gene Expression Data

Posted on:2024-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X T ChenFull Text:PDF
GTID:2544307091491324Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Variable selection is the key to statistical inference of high-dimensional regression models.The traditional regularization feature selection method has some shortcomings and does not consider the relationship between features.Gastric cancer is a common malignant tumor of digestive system.It is of practical significance to search for prognostic genes of gastric cancer.Taking gastric cancer gene expression data as an example,this paper used the feature selection method based on network structure and combined with the Cox model to explore prognostic genes affecting the survival risk of patients with gastric cancer,and verified the advantages of the feature selection method based on network structure.Specific work can be summarized as follows:First,the selection of gastric cancer characteristic genes.Differential expression analysis and univariate Cox analysis were used to retain 399 prognostic genes as much as possible.Wilcoxon test and energy distance test were combined in differential expression analysis.After constructing the protein-protein interaction network(PPI),15 important prognostic genes were screened using the feature selection method based on the network structure,among which UPK1 B,SMCP,CAST and CFHR4 were the risk prognostic factors.OR2L8,LRIT3,DNAJC28,MTPAP,HAT1,NEK5,TMEM120 B,POLB,NUDT2,OR4F5 and CDK20 were good prognostic factors.Secondly,the risk prognostic model of gastric cancer was constructed according to Cox regression coefficients of key genes.Net-Cox model divided gastric cancer patients into high and low risk groups.KM survival curve and Log-Rank test proved the effectiveness of the model in dividing high and low risk patients.In the training set samples,the AUC values of the model for 1-year,2-year and 3-year patients with gastric cancer were 0.856,0.915 and0.914,respectively;in the test set samples,the AUC values for 1-year,2-year and 3-year patients with gastric cancer were 0.614,0.628 and 0.544,respectively.All these indicated that the 15 genes screened by the feature selection method based on network structure had good prognostic value.Thirdly,this paper compares the effect of feature selection method based on network structure and regularization feature selection method without network on variable selection.In this paper,five classical netless regularization feature selection methods,namely Lasso,elastic net,adaptive Lasso,MCP and SCAD,are adopted,combined with Cox regression model.The comparison of model results showed that the number of variables screened by PPI network based feature selection method was moderate,and the prediction effect of the constructed prognostic model was better than that of the non-network based feature selection method in both the training set and the test set.This further verified the advantage of feature selection method considering the relationship between variables in the application of high-dimensional gastric cancer gene data.The value of this study lies in that,on the one hand,the statistical feature selection method is applied to the gene expression data of gastric cancer,and the value of high-dimensional gene data is fully utilized to efficiently screen prognostic genes of gastric cancer.The risk assessment model constructed provides a certain reference for the clinical analysis of patients with gastric cancer.On the other hand,the research on the application of statistical methods to gastric cancer gene data spans multiple disciplines,transforms the biological network between genes into statistical relationships,builds a new survival risk and prognosis model for gastric cancer,verifies the advantages of the new method in the application of high-dimensional gastric cancer gene data,and enriches the relevant literature.
Keywords/Search Tags:Feature Selection, Network-Regularized, Regularization Method, Gastric Cancer Gene
PDF Full Text Request
Related items