| Gastric cancer is a common malignant tumor of the digestive system in the world,with its incidence rate and mortality increasing year by year.Early gastric cancer has a high cure rate,but it is easy to be misdiagnosed,delaying treatment opportunities,and causing more serious consequences.Therefore,scientific and effective prediction methods are crucial for the diagnosis of gastric cancer.The development of machine learning technology provides a simple and effective way to detect the risk of gastric cancer,which helps to reduce the incidence rate of gastric cancer.In this paper,gene expression data of gastric cancer patients were obtained from TCGA database,and the original data set was preprocessed.Several feature screening methods were combined to screen feature genes,and an integrated classification learner was constructed to evaluate the classification performance of the optimal feature subset.Survival analysis was conducted using clinical information of cancer samples,The results indicate that the expression level of the selected feature genes is closely related to the survival rate of gastric cancer patients.This article mainly focuses on three aspects.(1)Gastric cancer feature gene screening.This study used gene expression differential analysis to screen 1300 differential genes,and then used the minimum redundancy maximum correlation algorithm to select the top 300 feature genes with the minimum redundancy and most relevant to the category.On this basis,the SVM-RFECV-PSO algorithm based on parameter optimization was used for feature screening,and finally 19 feature genes were obtained for modeling and analysis(2)The prediction model of gastric cancer was constructed on the screened data.First,the accuracy and AUC values were 95.12%,94.31% and 0.9923,0.9867,respectively,using the random forest and XGBoost algorithms.Then,the Stacking model combined with a variety of learning algorithms was used for prediction.The accuracy was 98.52% and the AUC value was 0.9979.The screened feature genes had good discrimination,and the Stacking model had the best prediction effect.(3)Survival analysis of gastric cancer patients.Among the 19 feature genes screened in this article,at least 6 were found to be related to the prognosis of the human digestive system or gastric cancer.Survival analysis was conducted using expression data of genes CLDN7,GCNT4,BAIAP2L2,and ALDH3A1 combined with clinical information.The results showed that these genes had a significant impact on the survival status of gastric cancer patients.This article studies the feature selection and ensemble learning model construction of gastric cancer gene expression data,providing new ideas and methods for early diagnosis of gastric cancer.At the same time,clinical data analysis based on feature genes also provides reference for the treatment and survival of gastric cancer patients.This study has certain value for early diagnosis and timely treatment of gastric cancer. |