Font Size: a A A

Identification Of Prognostic Gene Signatures For Laryngocarcinoma And Hypoharyngeal Carcinoma Patients Using Feature Selection Methods

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:T GanFull Text:PDF
GTID:2404330620471588Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Laryngeal cancer is a common malignant tumor of the head and neck,and most of them are squamous cell carcinoma.Hyparyngeal cancer is relatively rare in clinical practice.It accounts for 3.0%-5.0%of head and neck malignancies,and it is also the malignant tumor with the worst prognosis among head and neck tumors[1].In recent years,the incidence of laryngeal and hypopharyngeal cancer has been increasing year by year[2].Imbalanced gene expression is an intrinsic condition for the occurrence of many diseases.Therefore,the identification of the biological markers of prognosis of laryngeal and hypopharyngeal cancer can not only provide a new effective target for the treatment of laryngeal and hypopharyngeal cancer,but also provide a reliable diagnosis.This paper divided the screening of gene markers for laryngocarcinoma and hypopharyngeal cancer into three stages,namely the preliminary screening stage,the feature selection stage and the verification comparison stage.In the initial screening stage,26 genes were screened out by the Log-rank test in a microarray containing 20531 genes.In the feature selection phase,further feature selection was performed on the basis of 26 genes using Lasso regression,Boruta,XGBoost,and a deep neural network with attention mechanism?Attention-based DNN?.The gene markers screened by Lasso regression were DGCR9,FXYD6,LOC220930,PLAC1and PRAM1.The gene markers screened by Boruta were ST13,STIL and DGCR9.The gene markers screened by XGBoost were STIL,PLAC1,ZNF578 and MRPL35.DGCR9,KRTAP12,PRPF19 and SELL were selected gene markers by Attention-based DNN.In the verification and comparison phase,Cox regression was established using gene markers selected by different models,and then all samples was divided into two groups according to the median risk ratio,and Log-rank test were performed.The Log-rank P values are 1.564321e-06,0.0001945455,0.009050826 and 1.187648e-05 respectively.Among them,the Log-rank P value of Lasso regression is the smallest.Next,based on Lasso regression,a feature selection method based on the combination of reinforcement learning and Lasso was proposed.This method allows the agent to continuously try the training search of the feature subset,using the accuracy rate of the Lasso classifier as the instant revenue,and dynamically adjusting the features in the feature subset according to the gain of the feature subset.In the end,the agent will choose the feature subset with the greatest benefit as the optimal strategy,and this feature subset will be the selected gene markers.The gene markers selected by the feature selection method combining reinforcement learning and Lasso are BMP2,LOC220930,OR52B4 and SNORA71D.The corresponding P value of the Log-rank test after grouped by the median of HR values is 8.732182e-07,which is smaller than Lasso regression.
Keywords/Search Tags:gene marker, feature selection, Log-rank test, Lasso, attention mechanism, Boruta, XGBoost, reinforcement learning
PDF Full Text Request
Related items